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Preface 

The goal of this book is to present classical mechanics, quantum mechanics, and statistical 
mechanics in an almost completely algebraic setting, thereby introducing mathematicians, 
physicists, and engineers to the ideas relating classical and quantum mechanics with Lie 
algebras and Lie groups. The book should serve as an appetizer, inviting the reader to go 
more deeply into this fascinating, interdisciplinary fields of science. 

Much of the material covered here is not part of standard textbook treatments of classical or 
quantum mechanics (or is only superficially treated there). For physics students who want 
to get a broader view of the subject, this book may therefore serve as a useful complement 
to standard treatments of quantum mechanics. 

We motivate everything as far as possible by classical mechanics. This forced an approach 
to quantum mechanics close to Hciscnbcrg's matrix mechanics, rather than the usual ap- 
proach dominated by Schrodinger's wave mechanics. Indeed, although both approaches are 
formally equivalent, only the Heisenberg approach to quantum mechanics has any simi- 
larity with classical mechanics; and as we shall see, the similarity is quite close. Indeed, 
the present book emphasizes the closeness of classical and quantum mechanics, and the 
material is selected in a way to make this closeness as apparent as possible. 

Almost without exception, this book is about precise concepts and exact results in classical 
mechanics, quantum mechanics, and statistical mechanics. The structural properties of 
mechanics are discussed independent of computational techniques for obtaining quantita- 
tively correct numbers from the assumptions made. This allows us to focus attention on the 
simplicity and beauty of theoretical physics, which is often hidden in a jungle of techniques 
for estimating or calculating quantities of interests. The standard approximation machin- 
ery for calculating from first principles explicit thermodynamic properties of materials, or 
explicit cross sections for high energy experiments can be found in many textbooks and is 
not repeated here. 

The book originated as course notes from a course given by the first author in Fall 2007, 
written up by the second author, and expanded and polished by combined efforts, resulting 
in a uniform whole that stands for itself. Part II is mainly based on earlier work by the first 
author (Neumaier [183, 184]. The second author acknowledges support by the Austrian 
FWF-projects START-projcct Y-237 and IK 1008-N. Thanks go to Roger Balian, Clemens 
Elster, Martin Fuchs, Johann Kim, Mihaly Markot, Mike Mowbray, Hermann Schichl, Peter 
Schodl, and Tapio Schneider, who contributed through their comments on earlier versions 
of parts of the book. 

The audience of the course consisted mainly of mathematics students shortly before finishing 
their diploma or doctorate degree and a few postgraduates, mostly with only a limited 
background knowledge in physics. 

Thus we assume some mathematical background knowledge, but only a superficial acquain- 
tance with physics, at the level of what is available to readers of the Scientific American, 
say. It is assumed that the reader knows basic properties of vector spaces, linear algebra. 
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groups, differential equations, topology, and Hilbert spaces. No background in Lie algebras. 
Lie groups, or differential geometry is assumed. Rudiments of differential geometry would 
be helpful to expand on our somewhat terse treatment of it in Part IV; most material, 
however, is completely independent of differential geometry. 

While we give precise definitions of all mathematical concepts encountered, and an exten- 
sive index of concepts and notation, we avoid the deeper use of functional analysis and 
differential geometry without being mathematically inaccurate, by concentrating on situ- 
ations that have no special topological difficulties and only need a single chart. But we 
mention where one would have to be more careful about existence or convergence issues 
when generalizing to infinite dimensions. 

On the physics side, we usually first present the mathematical models for a physical theory 
before relating the models to reality. This is adequate both for mathematically-minded 

readers without much physics knowledge and for physicists who know already on a more 
elementary level how to interpret the basic settings in terms of real life examples. 

This is an open-ended book. It should whet the appetite for more, and lead the reader 
into going deeper into the subject.^ Thus many topics are discussed far too short for a 
comprehensive treatment, and often only the surface is scratched. A term has only this 
many hours, and our time to extend and polish the lectures after they were given was 
limited, too. We added some material, and would have liked to be more complete in many 
respects. Nevertheless, we believe that the topics treated are the fundamental ones, whose 
understanding gives a solid foundation to assess the wealth of material on other topics. 

We usually introduce physical concepts by means of informal historical interludes, and only 
discuss simple physical situations in which the relevant concepts can be illustrated. We 
refer to the general situation only by means of remarks; however, after reading the book, 
the reader should be able to go deeper into the original literature that treats these topics 
in greater physical depth. 

Part I is an invitation to quantum mechanics, concentrating on giving motivation and 
background from history and from classical mechanics. Part II gives a thorough treatment 
of the formal part of equilibrium statistical mechanics, viewed in the present context as 
the common core of classical and quantum mechanics, and discusses the interpretation of 
the theory in terms of models, statistics and measurements. Part 111 introduces the basics 
about Lie algebras and Poisson algebras, with an emphasis on the concepts most relevant to 
the conceptual side of physics. Part IV introduces the relevant background from differential 
geometry and applies it to classical Hamiltonian and Lagrangian mechanics, to a symplectic 
formulation of quantum mechanics, and to Lie groups. Part V applies the concepts to the 
study of quantum oscillators (bosons) and spinning systems (fermions) , and to the analysis 

^Somc general references for further reading: Barut & Raczka [25], CORNWELL [60], GiLMORE [92], 
and Sternberg [234], for the general theory of Lie algebras, Lie groups, and their representations from a 
physics point of view, Wybourne [265] and FuCHS & SCHWEIGERT [84] for a more application oriented 
view of Lie algebras, Kac [130] and Neeb [179] for infinite-dimensional Lie algebras, Papousek &: Aliev 
[188] for quantum mechanics and spectroscopy, van der Waerden [249] for the history of quantum 
mechanics, and Weinberg [256] for a (somewhat) Lie algebra oriented treatment of quantum field theory. 



CONTENTS 



ix 



of experimental spectra, concentrating on the mathematical contents of these subjects. 
The book concludes with numerous references and an index of all concepts and symbols 
introduced. For an overview of the topics treated, see Section 1.7. 

We hope that you enjoy reading this book! 

Wien, October 6, 2008 



Arnold Neumaier, Dennis Westra 
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Chapter 1 



Motivation 



Part I is an invitation to quantum mechanics, concentrating on giving motivation and 
background from history and from classical mechanics. 

The first chapter is an introduction and serves as a motivation for the following chapters. 
Wc will go over different areas of physics and give a short glimpse on the mathematical 
point of view. The final section of the chapter outlines the content of the whole book. 

For the mathematicians most of the folklore vocabulary of physicists may not be famil- 
iar, but later on in the book, precise definitions in mathematical language will be given. 
Therefore, apart from the material introduced in Section 1.6, there is no need to under- 
stand everything in the first chapter on first reading; we merely introduce informal names 
for certain concepts from physics and try to convey the impression that these have im- 
portant applications to reality and that there are many interesting solved and unsolved 
mathematical problems in many areas of theoretical physics.^ 



1.1 Classical mechanics 

Classical mechanics is the part of physics that started more or less with Isaac Newton 
(1642-1727). We do not dare to give a precise definition of classical mechanics, but simply 
describe some of its fields. 

It was in the period of Newton, Leibniz, and Galileo that classical mechanics was born, 
mainly studying planetary motion. Newton wanted to understand why the earth seems to 
circle around the sun, why the moon seems to circle around the earth, and why apples (and 
other objects) fall down. By analyzing empirical data, he discovered a formula explaining 
most of the observed phenomena involving gravity. The philosophically big step Newton 

^We encourage readers to investigate for themselves some of the abundant hterature to get a better 
feeling and more understanding that we can offer here. 
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made was to realize that the laws of physics here on earth are the same as the laws of 
physics determining the motion of the planets. 

The motion of a planet is described by its position and velocity at different times. With 
the laws of Newton it was possible to deduce a set of differential equations involving the 
positions and velocities of the different constituents of the solar system. Knowing exactly 
all positions and velocities at a given time, one could in principle deduce the positions and 
velocities at any other time. Our solar system is a well-posed initial value problem (IVP). 
However, an initial error in position and velocities at time to — grows exponentially at 
time t > by a factor of ~ e^*. The value of A varies for different initial conditions; its 
maximum is called the maximal Lyapunov exponent. A system with maximal Lyapunov 
exponent A = is called integrable. If A < the solutions converge to each other and 
if A > the solutions move away from each other. The solar system is apparently not 
quite integrable: according to numerical simulations, the maximal Lyapunov exponent for 
our solar system seems to be small but positive, with A~^ being about five million years 
(Laskar [153, 154], LisSAUER [158]). 

An important ingredient of classical mechanics is phase space. This is (in our context) 
the space in which all the positions and velocities of all relevant objects are represented as 
single points. Thus the motion of all planets together is a single path in phase space. For a 
single particle moving in space, there are three spatial directions to specify its position and 
three directions to specify the velocity. Hence the phase space of a (free) particle is six- 
dimensional; for a system of n astronomical bodies, the dimension is 6n. Low-dimensional 
phase spaces are well-understood in general. Newton showed that the configuration of 
a single planet moving around the sun is stable (in fact, the system is integrable) and 
motion follows the so-called Kepler's' law. Higher-dimensional phase spaces tend to cause 
problems. Indeed, for more planets (that is, more than 2 bodies in the system), deviations 
from elliptic motions are predicted, and the question of stability was open for a long time. 
The Swedish king Oskar II was willing to reward the scientist that could prove stability 
of our solar system with a big amount of money. However, Poincare showed that already 
three objects (one sun, two planets) caused big problems for a possible stability proof of 
our solar system and received the prize in 1887. The numerical studies from the end of the 
last century (quoted above) strongly indicate that the solar system is unstable, though a 
mathematical proof is missing. 

Wc now turn from celestial mechanics, where phase space is finite-dimensional, to contin- 
uum mechanics, which has to cope with infinite- dimensional phase spaces. For example, 
to describe a fluid, one needs to give the distribution of mass and energy and the local 
velocity for all (infinitely many) points in the fluid. The dynamics is now governed by 
partial differential equations. In particular, fluid mechanics is dominated by the Navier— 
Stokes equations, which still offer a lot of difficult mathematical problems. Showing that 
solutions exist for all times (not only short-time solutions) is one of the Clay Millennium 
problems (see, e.g., Ladyzhenskaya [149]), and will be rewarded by one miUion dollars. 

The infinitely many dimensions of the phase space cause serious additional problems. The 
Lyapunov exponents now depend on where the fluid starts in phase space and for fast- 
flowing fluids, the maximal Lyapunov exponent is much larger than zero in most parts 
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of phase space. This results in a phenomenon called turbulence, well-known from the 
behavior of water. The notion of turbulence is still not well understood mathematically. 
Surprisingly enough, the problems encountered with turbulence are of the same kind as the 
problems encountered in quantum field theories (QFT's) - one of the many instances where 
a problem in classical mechanics has an analogue in quantum physics. 

Another area of continuum mechanics is elasticity theory, where solids are treated as 
continuous, nearly rigid objects, which deform slightly under external forces. The rigidity 

assumption is easily verified empirically; try to swim in metal at room temperature Due 

to the rigidity assumption the behavior is much better understood mathematically than in 
the fluid counterpart. The configuration of a solid is close to equilibrium and the deviations 
from the equilibrium position are strongly suppressed (this is rigidity). Hence the rigidity 
assumption implies that linear Taylor approximations work well since the remaining terms 
of the Taylor series are small, and the Lyapunov exponent is zero. 

Elasticity theory is widely applied in engineering practice. Modern bridges and high rise 
buildings would be impossible without the finite element analyses which determine their 
stability and their vibration modes. In fact, the calculations are done in finite-dimensional 
discretizations, where much of the physics is reducible to linear algebra. Indeed, all con- 
tinuum theories are (and have to be) handled computationally in approximations with 
only finitely many degrees of freedom; in most areas very successfully. The mathematical 
difficulties are often related to establishing a valid continuum limit. 

In the period 1900-1920, classical mechanics was enriched with special relativity theory 
(SRT) and general relativity theory (GRT). In SRT and GRT space and time merge 
into a four-dimensional manifold, called space-time. Space-time is in general not flat, but 
has curvature. That means that freely moving objects do not follow real straight lines 
- in fact the notion of what straight means is blurred. The trajectory that an object in 
a free fall, where no forces are exerted on the object, will follow is called a geodesic. 
The geodesies are determined by the geometry by means of a second-order differential 
equation. The preferred direction of time on a curved space-time is now no longer flxed, or 
as mathematicians say 'canonical', but is determined by the observer: The geodesic along 
the observers' 4-momentum vector defines the world line of the observer (e.g., a measuring 
instrument) and with it its time; the space-like surfaces orthogonal to the points on the 
world line define the observer's 3-dimensional space at each moment. When the observer 
also defines a set of spatial coordinates around its position, and a measure of time (along 
the observers' geodesic), one can say that a chart around the observer has been chosen. 

In SRT, distances in space and time are measured with the Minkowski metric, an indefinite 
metric (discussed in more detail in Section 1.6) which turns space-time a pseudo-Riemannian 
manifold. Different observers (in SRT) all see the same speed of light. But they see the same 
distances only when measured with the Minkowski metric - not with the Euclidean spatial 
or temporal metric (which also holds for the orthogonahty mentioned above). It follows 
that spatial separation and temporal separation between localized systems (for example a 
chicken laying an egg and an atom splitting) are different for different observers! But the 
difference is observable only when the two systems move at widely different velocities, hence 
Newton couldn't have noticed this deviation from his theory. 
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In classical mechanics, time is absolute in the sense that there exists a global time up to time 
shifts; the time difference between two events is the same in every coordinate system. The 
symmetries of classical space-time is thus the group generated by time-translations, space 
translations and rotations. This group is the Galilean group. Due to the experimental 
fact that the speed of light in vacuum is the same for all observers led Einstein to the 
conclusion that time is not absolute and that the Galilean group should be enlarged with 
transformations that rotate space and time coordinates into each other. The result was the 
theory of special relativity. Due to special relativistic effects in the quantum theory, the 
world indeed looks different; for example, without special relativity, gold would be white, 
and mercury would be sohd at room temperature NORRBY [185] . 

SRT is only vahd if observers move at fixed velocities with respect to each other. To handle 
observers whose relative velocities may vary requires the more general but also more complex 
GRT. The metric now depends on the space-time point; it becomes a nondegenerate sym- 
metric bilinear form on the space-time manifold. The transformations (diffeomorphisms) 
relating the metric in one patch to the metric in another patch cannot change the signature. 
Hence the signature is the same for all observers. 

When time becomes an observer dependent quantity, so becomes energy. Local energy 
conservation is still well defined, described by a conservation law for the resulting differential 
equations. The differential equations are covariant, meaning that they make sense in any 
coordinate system. For a large system in general relativity, the definition of a total energy 
which is conserved, i.e., time-independent, is however problematic, and well-defined only if 
the system satisfies appropriate boundary conditions such as asymptotic flatness, believed 
to hold for the universe at large. Finally, if the system is dissipative, there is energy loss, 
and the local conservation law is no longer valid. Not even the rate of energy loss is well 
defined. Dissipative general relativity has not yet found its final mathematical form. 



1.2 Thermodynamics and statistical mechanics 

A major area both in classical and in quantum mechanics is thermodynamics; indeed, the 
two realms of physics meet there very closely. 

Thermodynamics was developed especially in the industrial revolution in England. Maxwell 
wrote many papers on a mathematical foundation of thermodynamics. With the establish- 
ment of a molecular world view the thermodynamical machinery slowly got replaced by 
statistical mechanics, where the macroscopic properties like heat capacity, entropy, temper- 
ature were explained through considerations of the statistical properties of a big population 
of particles. The first definite treatise on statistical thermodynamics is by GiBBS [90]^ who 
also invented much of the modern mathematical notation in physics, especially the notation 
for vector analysis. 

One can say that quantum mechanics and classical mechanics are closest when viewed in 



^Today, this book from 1902 is still easily readable. 
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the context of statistical mechanics; indeed, Gibbs' account of statistical mechanics had to 
be altered very little after quantum mechanics revolutionized the whole of science. In this 
course, we shall always emphasize the closeness of classical and quantum reasoning, and 
put things in a way to make this closeness as apparent as possible. 

An important ingredient in statistical mechanics is a phase space density p playing the 
role of a measure to calculate probabilities; the expectation value of a function / is given 

by 

(/) = Ipf, (1-1) 

where the integral indicates integration with respect to the so-called Liouville measure in 
phase space. 

In the quantum version of statistical mechanics the density p gets replaced by a linear 
operator p on Hilbert space called the density matrix, the functions become hnear opera- 
tors, and we have again (1.1), except that the integral is now interpreted as the quantum 
integral, 

// = tr/, (1.2) 

where tr/ denotes the trace of a trace class operator. Wc shall see that the algebraic 
properties of the classical integral and the quantum integral are so similar that using the 
same name and symbol is justified. 

A deeper justification for the quantum integral becomes visible if we introduce the Lie 
product^ 

^ _i / {S"' classical case, ,^ , 

^ I i [/' 5'] ^be quantum case, ^ ' ^ 

unifying the classical Poisson bracket 

{f,9} ■■= dgf ■ dpQ - dqQ ■ dpf 

on the algebra E = C°°{Q) of smooth functions on phase space Q = x M^^, and the 
quantum commutator 

/, g] — fg - gf 

on the algebra E = LinC°°(R^^) of hnear operators on the space of smooth functions on 
configuration space. (Here i = v^— 1 - complex numbers will figure prominently in this 
book! and h is Planck's constant in the form introduced by DiRAC [64]. Planck had 
used instead the constant h = 27rh which caused many additional factors of 27r in formulas 
discovered later.) The Lie product is in both cases an antisymmetric bilinear map from 
E X E to E and satisfies the Jacobi identity; see Chapter 8 for precise definitions. 

In the classical case, the fact that integration and differentiation are inverse operations 
implies that the integral of a derivative of a function vanishing at infinity is zero. The tra- 
ditional definition of the Poisson bracket therefore implies that, for functions /, g vanishing 
at infinity, 

If^g-0- (1-4) 



The symbol ~i , frequently used in the following, was created in Neumaier, [183], explicitly for the 
purposes of unifying quantum mechanics and classical mechanics. It is a stylized inverted capital letter L 
and should be read as "Lie". 
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Remarkably, in the quantum case, (1.4) is valid for Hilbert-Schmidt operators f,g since 
then ti fg = trgf, so that ff ^ g = tTj-[f,g] = |(tr/gf — trgf) — 0. Thus the quantum 
integral behaves just like the Liouville integral! 

Understanding statistical physics requires the setting of Hamiltonian mechanics, which 
exists both in a classical and a quantum version. 

Much of classical mechanics can be understood in both a Hamiltonian formulation and a 
Lagrangian formulation; cf. Chapter 12. In the Hamiltonian formulation, the basic object is 
the Hamiltonian function H on the phase space fl, which gives the value of the energy at an 
arbitrary point of phase space. Specifying H essentially amounts to specifying the physical 
system under consideration. Often the phase space Q is the cotangent bundle T*M of a 
manifold M. In the Lagrangian formulation, the main object is a Lagrangian function L 
on the tangent bundle TM of a manifold M. The Lagrangian is thus not a function on 
phase space, and has no simple physical interpretation, but in some sense, it plays a more 
fundamental role than the Hamiltonian since it survives the transition to the currently most 
fundamental physical theory, quantum field theory. The passage between the Hamiltonian 
and the Lagrangian formulation is straightforward in simple cases but may cause problems, 
especially in so-called gauge theories (pronounced in English transcription as gaidge). 
Gauge theories are outside the scope of this book; the curious reader is referred to the vast 
literature. 

In the Hamiltonian formulation the timc-dcpcndcncc of a function / on the phase space is 
determined by the classical Heisenberg equation 

f^^H.f^{f,H}, (1.5) 

where H is the Hamiltonian function defined above. Important to note is that the Hamil- 
tonian function determines the time-evolution. Suppose we can solve the differential equa- 
tions, then we have an operator U (s, t) that maps objects at a time s to corresponding ob- 
jects at a time t. Clearly, the composition of the operators gives U{s, s')U{s' ,t) = U{s,t). 
If the Hamiltonian is independent of time (this amounts to assuming that there are no 
external forces acting on the system) the maps U form a so-called one-parameter group, 
since one can write 

U{s, t) = e(*-*)^^« , 

with the associated Hamiltonian vector field ad//, which is determined by H. The vector 
field ad/f generates shifts in time. In terms of ad/f, the multiplication is given by 

and the inverse is given by 

U{r, s)'^ = U{s, r) , (e*^'^«)-' = e-*^« . 

Given another Hamiltonian function H' we get another (Hamiltonian) vector field ad/// 
and another one-parameter group. The "time" parameter t has then a different physical 
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interpretation; for example, if H' is a component of the momcntTim vector (or the angular 
momentum vector) then t corresponds to a translation (or rotation) in the corresponding 
coordinate direction. Combining these groups for all H', where the initial value problem 
determined by ad/// is well posed, we get an infinite-dimensional Lie group. (See Sections 
8.3 and 11.7 for a definition of Lie groups.) Thus, classical mechanics can be understood 
in terms of infinite-dimensional groups! To avoid technical complications, we shall however 
mainly be concerned with the cases where we can simplify the system such that the groups 
are finite-dimensional. In the present case, to obtain a finite-dimensional group one either 
picks a nice subgroup (this involves understanding the symmetries of the system) or one 
makes a partial discretization of phase space. 

Most of our discussions will be restricted to conservative systems, which can be described by 
Hamiltonians. However, these only describe systems which can be regarded as isolated from 
the environment, apart from infiuences that can be specified as external forces. Many real 
life systems (and strictly speaking all systems with the exception of the universe as a whole) 
interact with the environment - indeed, if it were not so, they would be unobservable! 

Ignoring the environment is possible in so-called reduced descriptions, where only the vari- 
ables of the system under consideration arc kept. This usually results in differential equa- 
tions which are dissipative. In a dissipative system, the energy dissipates, which means 
that some energy is lost into the unmodelled environment. Due to the energy loss, going 
back in time is not well defined in infinite-dimensional spaces; the initial value problem 
is solvable only in the forward time direction. Hence we can not find an inverse for the 
translation operators U{s,t), and these are defined only for s < t. Therefore, in the most 
general case of interest, the dissipative infinitesimal generators do not generate a group, 
but only a semigroup. A well-known example is heat propagation, described by the heat 
equation. Its solution forward in time is a well-posed initial value problem, whereas the so- 
lution backward in time suffers from uncontrollable instability. Actually, many dissipative 
systems are not even described by a Hamiltonian dynamics, but the semigroup property of 
the fiow they generate still remains valid. 



1.3 Quantum mechanics 

A point of view on quantum mechanics is that it is a deformation of classical mechanics. 
The deformation parameter is Planck's constant U. 

Since in daily life we do not really sec so much of quantum physics (try tunnelling through 
a wall) one requires that the so-called correspondence principle holds. The correspon- 
dence principle states that the formulas of quantum physics should turn into corresponding 
formulas of classical mechanics in the limit when ^ — > 0. This limit is called the classi- 
cal limit. However, ^ is a constant of Nature and not a parameter that can be changed 
in experiments; thus this 'limit' must be properly interpreted. The action of a physical 
system is a certain functional S that gives rise to equations of motion and is measured in 
units of %. The right way to think of the classical limit is therefore the limit when the 
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dimensionless quotient S/h becomes arbitrarily large. The classical limit therefore amounts 
to considering the limiting case of very large S. Thus classical mechanics appears as the 
limit of quantum mechanics when all values of the action involved arc huge when measured 
in units of h, which is the case when the systems are sufficiently large. 

Keeping only the linear orders of h one obtains so-called semiclassical approximations, 
which is intermediate between quantum mechanics and the classical limit, and often can be 
used for quite small systems. 

The value of Planck's constant is approximately 6.6 • lO^^^Js; its smallncss is the reason 
why we do not encounter quantum phenomena on our length and time scales that often. 
A nice gedanken experiment, which goes under the name Schrodinger's cat, illustrates 
(though in a philosophically questionable way) quantum mechanics in terms of daily-life 
physics. In short the experiment goes as follows. Suppose wc have put a cat in a box and in 
the same box we have a single radioactive nucleus. The state of the nucleus is determined 
by the laws of quantum physics; the nucleus can disintegrate but we don't know when. 
The process of disintegration is described by a probability distribution that is dictated by 
the laws of quantum mechanics. Now suppose we link to the nucleus a detector and a 
gun; if the nucleus disintegrates, the gun, which is aimed at the cat, goes off and kills the 
cat. The state of the cat, dead or alive, is now given by a quantum mechanical probability 
distribution. But common sense expects the cat to be either dead or alive. . . . 

An important object in quantum mechanics is the wave function. The wave function is 
a complex-valued function of space and time coordinates that is square-integrable over 
the space coordinates for all time t. Because of the integrability one always normalizes 
the wave function to have total weight 1 and then the wave function is interpreted as a 
probability distribution. Indeed, the Uranium nucleus in the gedanken experiment above 
is described by a wave function. In a first course on quantum mechanics one postulates the 
time-dependent Schrodinger equation 



for a single particle described by the wave function ifj, where H is the Hamiltonian - now 
an operator. The Schrodinger equation describes the dynamics of the wave function and 
thus of the particle. 

In A^-particlc quantum mechanics, the Hamiltonian is a second-order differential operator 
with respect to the spatial coordinates. According to the present state of knowledge, the 
fundamental description of nature is, however, given not in terms of particles but in terms of 
fields. The quantum mechanics of fields is called quantum field theory; the Hamiltonian 
is now an expression composed of field operators rather than differential operators. While 
nonrelativistic quantum mechanics may be treated equivalently as A^-particle mechanics or 
as nonrelativistic quantum field theory, the relativistic quantum mechanics of more than 
one particle (see, e.g., the survey by Keister & Polyzou [133]) is somewhat clumsy, and 
one uses almost exclusively relativistic quantum field theory for the description of multiple 
relativistic particles. 



ih—i/j — Hijj 
at 




Although very important in physics, the mathematical theory of quantum fields is well 
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developed only in the case of space-time dimensions < 4. While this covers important 
applications such as quantum wires (which have only one important spatial dimension), 
it does not cover the most important case - the 4-dimensional space-time we live in. For 
example, quantum electrodynamics (QED), the most accurate of all quantum theories, 
exists only in the form of perturbative expansions (or other approximation schemes) and 
provides mathematically well-defined coefficients of a series in the fine structure constant a 
(whose experimental value is about 1/137), which is believed to be divergent DYSON [66]. 
The first few terms provide approximations to a-dependent numbers like the magnetic 
moment g{a) of the electron, which match experimental data to 12 digit accuracy, or 
the Lamb shift A£'Lamb(Q;) of hydrogen, whose explanation in 1947 by Julian Schwinger 
ushered in the era of quantum field theory. However, the value of a divergent asymptotic 
series at a finite value of a is mathematically ill-defined; no consistent definition of functions 
such as g{a) or is known to which the series would be asymptotic. 

Finding a mathematically consistent theory of a class of 4-dimensional quantum field the- 
ories (quantum Yang-Mills gauge theory with a compact simple, nonabelian gauge group, 
believed to be the most accessible case) is another of the Clay Millennium problems whose 
solution is worth a million dollars. 

Many quantum observations, for example scattering processes, have a stochastic character, 
so probabilities are frequent in quantum mechanics. In particular, in case of a single particle, 
the probability density for observing the particle at x is |'?/^(x)p. Physicists and chemists 
occasionally view a scaled version of the probability distribution [■^l as a charge density. The 
justification is that in a polulation of a great number of particles that are all subject to the 
same Schrodinger equation, the particles will distribute themselves more or less according 
to the probability distribution of a single particle. 

In quantum mechanics, the classical functions very often go over into corresponding oper- 
ators defined on a dense subspace of a suitable separable Hilbert space. For example, the 
momentum in the x-direction of a particle described by a wave function ip can be described 
by the operator —ihdx- As we shall see, this process of quantization has interesting con- 
nections to the representation theory of Lie algebras. Using the correspondence between 
classical functions and operators one deduces the Hamiltonian for an electron of the hy- 
drogen atom, the basis for an explanation of atomic physics and the periodic system of 
elements. 

The hydrogen atom is the quantum version of the 2-body problem of Newton and is the 
simplest of a large class of problems for which one can explicitly get the solutions of the 
Schrodinger equation - it is integrable in a sense paralleling the classical notion. Unfortu- 
nately, integrable systems are not very frequent; they seem to exist only for problems with 
finitely many degrees of freedom, for quantum fields living on a 2-dimensional space-time, 
and for noninteracting theories in higher dimensions. (Whether there are exactly solvable 
interacting local 4-dimensional field theories is an unsolved problems.) Nevertheless, the 
hydrogen atom and other integrable systems are very important since one can study in these 
simple models the features which in some approximate way still hold for more complicated 
physical systems. 
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Given a solution ip to the Schrodinger equation, normalized to satisfy = 1 (where ■0* 
is the adjoint linear functional, in finite dimensions the conjugate transpose), one obtains a 
density operator p = ipip* , which is a Hermitian, positive semidefinite rank-one operator of 
trace trp = il>*il> = 1. This type of density operators characterizes so-called pure states; 
the nomenclature coincides here with that of the mathematical theory of C*-algebras. 

For conservative systems, the Hamiltonian is a self-adjoint linear operator. We assume 
that the quantum system is confined to a large box; a frequently employed artifice in 
quantum mechanics, which in a rigorous treatment must be removed later by going to the 
so-called thermodynamic limit where the box contains an arbitrarily large ball. Under 
this circumstance the spectrum of the Hamiltonian is discrete. The eigenvalues of the 
Hamiltonian correspond to energy levels, but only energy differences are observable (e.g., 
as spectral hnes), and one generally shifts the Hamiltonian operator such that the lowest 
eigenvalue is zero. By the spectral theorem, the eigenvalues 

= £^0 < ^1 < ^2 < ••■ 
are real, and we can find a set of eigenvectors ■^fc, normalized to satisfy V'fcV'fe — 1? such that 

HtlJk = Eki/Jk 

and 

H = J2EkArk- (1-7) 

-00 is called a ground state, the are the energy levels, and £"1 = — E'o > is 
called the energy gap. The energy gap is positive iff the smallest eigenvalue is simple, i.e., 
iff the ground state is unique up to a phase, a constant factor of absolute value 1. The 
eigenvectors are the solutions of the time-independent Schrodinger equation 

= E^, (1.8) 

and the ground state is the soution of minimal energy Eq. With our normalization of 

energies, Eq = and hence Hipo = 0, implying that the ground state is a time-independent 
solution of the time-dependent Schrodinger equation (1.6). The other eigenvectors ipk lead 
to time-dependent solutions tjjk{t) = 6**^'=/^-?/^^, which oscillate with the angular frequency 
LVk = Ef^/h. This gives Planck's basic relation 

E^fiw (1.9) 

relating energy and angular frequency. (In terms of the ordinary frequency v — uj /2t: and 
Planck's original constant h — 27rh, this reads E — hv.) The completeness of the spectrum 
and the superposition principle now implies that for a nondegenerate spectrum (where 
all energy levels are distinct), 

V^(t) = ^afee*-'=Vfe 

is the general solution of the time-dependent Schrodinger equation. (In the degenerate case, 
a more complicated confluent formula is available.) Thus the time-dependent Schrodinger 
equation is solvable in terms of the time-independent Schrodinger equation, or equivalently 
with the spectral analysis of the Hamiltonian. This is the reason why spectra are in the 
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center of attention in quantum mechanics. The relation to observed spectral lines, which 
gave rise to the name spectrum for the list of eigenvalues, is discussed in Chapter 17. 

The spectral decomposition (1.7) also provides the connection to quantum statistical me- 
chanics. A thorough discussion of equilibrium statistical mechanics emphasizing the quan- 
tum-classical correspondence will be given in Part 11. Here we only scratch the surface. 
Under sufficiently idealized conditions, a thermal quantum system is represented as a so- 
called canonical ensemble, characterized by a density operator of the form 

with T the temperature and k the Boltzmann constant, a tiny constant with approximate 
value 1.38 • IQ-^V/X. Hence we get 

P = E ^'^'"'^'^^l = V'o^o + e-'^^^^i^i* + . . . 

At room temperature, T ^ 300K, hence /? ~ 5 • 10^° J~-^. Therefore, if the energy gap 
El — Eq is not too small, it is enough to keep a single term, and we find that p ~ V'oV'o- 
Thus the system is approximately in the ground state. 

The fact that the ground state is the most relevant state is the basis for fields like quantum 
chemistry: For the calculation of electron configuration and the corresponding energies of 
molecules at fixed positions of the nuclei, it suffices to know the ground state. An exception 
is to be made for laser chemistry where a few excited states become relevant. To compute 
the ground state, one must solve (1.8) for the electron wave function ipQ, which, because 
of the minimality condition, is a global optimization problem in an infinite-dimensional 
space. The Hartree— Fock method and their generalizations are used to solve these in 
some discretization, and the various solution techniques are routinely available in modern 
quantum chemistry packages. 

Applying the Schrodinger equation to the pure state p — ipip* and noting that H* = H, 
one finds that 

ih^ = ih{ipilj* + i/jTp*) = {ihip)'ilj* - i/jiihtpy = Hi/ji/j* - ip{HiP)* ^Hp- pH, 

(Jjv 

giving the quantum Liouville equation 

^±^l.(H,-,H)^LyH,A. (1.10) 



1.4 The Heisenberg picture 

In the beginning of quantum mechanics there were two independent formulations; the 
Heisenberg picture (discovered in 1925 by Heisenberg), and the Schrodinger picture 
(discovered in 1926 by Schrodinger). Although the formulations seem different at first sight, 
they were quickly shown to be completely equivalent. In the Schrodinger picture, the phys- 
ical configuration is described by a time-dependent state vector in a Hilbert space, and 
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the observables are time-independent operators on this Hilbcrt space. In the Heisenberg 
picture this is the other way around; the vector is time-independent and the observables 
are time- dependent operators. 

The connection between the two pictures comes from noting that everything in physics that 
is objective in the sense that it can be verified repeatedly is computable from expectation 
values (here representing averages over repeated observations) , and that the time-dependent 
expectations 

{f)t^Ipit)f^Jfpit) (1.11) 

(remember that the quantum integral (1.2) is a trace!) in the Schrodinger picture can be 
alternatively written in the Heisenberg picture as 

{f)t = Ipf{t) = If{t)p. (1.12) 

The traditional view of classical mechanics corresponds to the Heisenberg picture - the 
observables depend on time and the density is time- independent. However, both pictures 
can be used in classical mechanics, too. 

To transform the Heisenberg picture description to the Schrodinger picture, we note that 
the Heisenberg expectations (1.12) satisfy 



d{f)t 



-ipm-ip{m,H}^{{f,H})„ 



dt 

giving the differential equation 

(1.13) 



dt 

for the expectations. An equivalent description in the Schrodinger picture expresses the 
same dynamical law using the Schrodinger expectations (1.11). To deduce the dynamics 
of p{t) we need the following formula which can be justified for concrete Poisson brackets 
with integration by parts, 

J{f,9}h = Jf{g,h}- (1.14) 

cf. (1.18) below. Using this, we find as consistency condition that ^{f)t = jfp{t) ^^^^ 
{{f,H})t = J{f, H}p{t) = Jf{H,p(t)} must agree for all /. This dictates the cleissical 
Liouville equation 

m-{H,pm- (1-15) 

In the quantum case, the Heisenberg and Schrodinger formulations are equivalent if the 
dynamics of f{t) is given by the quantum Heisenberg equation 

ihj^f{t)^[HJ{t)]. 

To check this, one may proceed in the same way as we did for the classical case above. 
Using the Lie product notation (1.3), the dynamics for expectations takes the form 
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the Heisenberg equation becomes 

if=H^f, (1.16) 

and the Liouville equation becomes 

m = p{t)^H. (1.17) 

(Note that here H appears in the opposite order!) independent of whether we consider 
classical or quantum mechanics. 

We find the remarkable result that these equations look identical whether we consider 
classical or quantum mechanics; moreover, they are linear in / although they encode all the 
intricacies of nonlinear classical or quantum dynamics. Thus, on the statistical level, 
classical and quantum mechanics look formally simple and identical in structure. 

The only difference is in the definition of the Lie product and the integral. 

The connection is in fact much deeper; we shall see that classical mechanics and quantum 
mechanics are two applications of the same mathematical formalism. At the present stage, 
we get additional hints for this by noting that, as we shall see later, both the classical and 
the quantum Lie product satisfies the Jacobi identity 

f^{g^h)+g^{h^f)+h^{f^g)^0, 

hence define a Lie algebra structure; cf. Section 8.1. They also satisfy the Leibniz identity 

9^fh = {g ^ f)h + f\g ^ h) 

characteristic of a Poisson algebra; cf. Section 9.1. Integrating the Leibniz identity and 
using (1.4) gives = fgn fh = J{{g n f)h + f{g n h)) = -/(/ n g)h + Jf{g n h), hence 
the integration by parts formula 

Jfig^h)^Jifng)h. (L18) 

Those with some background in Lie algebras will recognize (1.18) as the property that 
/ fg defines a bilinear form with the properties characteristic for the Killing form of a Lie 
algebra - again both in the classical and the quantum case. Finally, the Poisson algebra 
of quantities carries both in the classical case and in the quantum case an intrinsic real 
structure given by an involutive mapping * satisfying /** = / and natural compatibility 
relations with the algebraic operations: /* is the complex conjugate in the classical case, 
and the adjoint in the quantum case. 

Thus the common structure of classical mechanics and quantum mechanics is encoded in 
the algebraic structure of a Poisson *-algebra. This algebraic structure is built up in the 
course of the book, and is then exploited to analyze one of the characteristic quantum 
features of nature - the spectral lines visible in light emanating from the sun, or from some 
chemical compound brought into the fiame of a Bunsen burner. 
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1.5 Rotations 



The rotation group. Let us consider a general time-dependent rotation in M, given by a 
3 X 3-matrix Q{t) satisfying Q{t)Q{ff = Q{t)^Q{t) = 1 and det Q{t) = 1. Differentiating 
we get 

Q{t)Q{ty + Q{t)Q{tf ^0. 
Calling n{t) = Q{t)Q{tY = Q{t)Q{t)-^ we thus have 

that is Q is antisymmetric. We can therefore parameterize Q as 

—UJ3 UJ2 

il = \ Us —ui 

—UJ2 COi 

We then have f2v = cj x v, where u is the vector (cji, cj2, cjs)"^. We view fl{t) as a matrix 
X{u!{t)) depending on the vector a;(t), called the angular velocity. 

Intuitively it is clear that a rotation is an invertible map from to M^, which is linear, 
preserves angles, distances and orientation. To make the intuition more precise we define 
the group of rotations, also called the special orthogonal group 5*0(3), by 

SO{3) := {Q e K^^^ I Q^Q = 1 , det g = 1} . 

The group 5*0(3) is a basic example of a Lie group; it is both a group and a smooth manifold. 
A general group element is a rotation and although this might seem simple enough it is still 
quite complicated and in fact the tangent space at the origin of the group carries enough 
structure for most analysis. The tangent space of a Lie group at the identity is vector space 
with an additional structure, which is inherited from the group structure. Such a vector 
space is called a Lie algebra, which we will define later in this book ; the reader not familiar 
yet with Lie algebras can skip this part or already look up the definition. The Lie algebra 
of 50(3) is denoted so(3) and can be identified with infinitesimal rotations. So we consider 
Q = l + yl + 0(||A|p), where the Landau symbol O(-) is used to indicate higher order 
terms, and write out Q^Q = 1 to get 

1^(1+A'^ + 0{\\A\\^)){1 + A + 0{\\A\\''))^1 + A + A^ + 0(\\A\\''), 

and hence we have 

so(3) = {Ae M''^^^ I A + = 0} . 

The Lie product of so{3) is given by the commutator; A~i B = AB — BA. We can write 
any antisymmetric 3 x 3-matrix as 

Xiuj) = \ uj?, 
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for some uj = (cui, 0^2, c^s)^ G M^. The parametrization is chosen such that X{uj)v — u x v. 
Geometrically, X{uj) describes an infinitesimal rotation around an axis pointing into the 
direction of a;; e^^"^^ is the rotation araund this axis by the angle |a;| in the positive (counter- 
clockwise) direction. 

Every rotation has the form for some antisymmetric real 3 x 3-matrix A; this is a non- 
trivial result in the theory of Lie groups, where one proves that the exponential map is 
surjective for compact groups, and 5*0 (3) is compact. For the special orthogonal group 
SO{3) we give the explicit formula, called the Rodrigues formula 

expX(a) = 1 + ^-^X{a) + ^^^^(a)^ , if a 7^ , (1.19) 

\a\ |ap 

(|a| = V a^a) for the exponential of a real, antisymmetric 3 x 3-matrix X(a), which can be 
proven by differentiating expX(ta) with respect to t. 

Above the vector product appeared in formulae involving so(3), we know make this more 
explicit by showing that equipped with the vector product is in fact isomorphic to so(3) 
as a Lie algebra. First we check that with the vector product is indeed a Lie algebra. 
The antisymmetry is obvious; x~\y — xxy — —y x x — —y "i x. For the Jacobi-identity 
we use the formula 

X X {y X z) = y{x ■ z) — z{x ■ y) , 

to obtain 

a X {h X c) + h X {cx a) + cx {a xh) 

— b{a ■ c) — c{b ■ a) + c{b ■ a) — a{c ■ h) + a{c ■ h) — b{c ■ a) 

= 0. 

To establish the isomorphism we use the map R^ — > so(3) given by c<j 1-^ X{uj). We need to 
show X{uj) "1 X{uj') = X{uj X u'). This follows immediately since the commutator of X[uj) 
and X{u!') acts on a vector u as 

u u X X u) — u' X X u) , 

and using the Jacobi identity in R^, we see that the given map equals 

{cu X cu') X U — X{U! X U!')u . 



Taking a closer look at X{uj) we see that there is a basis of so(3) consisting of three elements 
Jj with X{uj) = Yli=i^i'^ij corresponding to infinitesimal rotations around the coordinate 
axes. We assemble the three J's in a column vector and (ab-)use the notation X{u;) —lo-J. 
Writing out X{u) n X{u') = X{u x u') we get 

m 

where €kim is the Levi-Civita symbol given by 

{1 if klm is an even permutation of 123 , 
—1 if klm is an odd permutation of 123 , (1-21) 
otherwise. 
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Thus ekim is completely antisymmetric in the indices k, I, m and 1 < /c, /, m < 3. Therefore, 
the summation above contains only one nonzero term. For example, Ji ~i J2 = J3 and all 
other Lie products can be found using a cyclic permutation. 

The Pauli matrices. A discussion of the relation between su{2) and so(3) will be useful 
for later. The Lie algebra su{2) is defined as the real vector space of antihermitian 2x2- 
matrices equipped wit the commutator as Lie product. We introduce the Pauli matrices 

Assembling the last three in the vector a — ((Ji, (T2, cr^)^, we write for any p e 

p . a := p,a, + p,a, + p^a^ = ( ^ . ) . (1.23) 

\Pi — ip2 — Pa / 

The matrices iai,ia2,i<Js form a basis for the Lie algebra su{2); indeed, any traceless and 
antihermitian 2x2 matrix can be written as p • cr for some p e R^. The Pauli matrices 
satisfy 

Wk,cri]^'2i^ekimcrm, CTkCri + criCTk ^ "^hicro , 1 <k,l,m<3, 

m 

with the Levi-Civita symbol ekim defined in (1.21). We thus obtain 

[p • (7, q • cr] = 2ip xq-cr, p-(Tq-(7-|-q-(7p-(7 = 2p- qao ■ 

The matrices iuo, iui, i(T2, iua make up the Lie algebra u{2); only iao is not traceless. In- 
troducing 

/O 0\ /O -1\ /O -1 0\ 

Li= 1, L2 = , L3= l , 
\0 -1 0/ \1 / \0 0/ 

which form a basis of so(3), we see that the correspondence ^ak ^ Lk for /c = 1, 2, 3 defines 
an isomorphism of Lie algebras. Clearly, iao spans the center of the Lie algebra u{2). As a 
consequence, we can write u{2) = R ® su{2) = R ® so(3). 

The isomorphism SO{3) = SU{2)/Z2. Recall that a real 3x3 matrix R is called special 
orthogonal if 

R'^R = 1 , det = 1 . 

Note that 1 here denotes the 3x3 identity matrix in the first equation. The special 
orthogonal matrices form the group 5*0(3). 

If A is an eigenvalue of it! e SO{3) we see that A = ±1. We want to show that there 
is always an eigenvector with eigenvalue 1. The characteristic polynomial of R has three 
roots Ai, A2 and A3. The absolute value of the roots has to be 1, and for every nonreal 
eigenvalue /x, its conjugate jl is an eigenvalue, too. If all the three eigenvalues are real, then 
the only possibihties are that all three are 1 or that two are —1 and the third is 1. Let 
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now Ai be imaginary and take A2 = Ai. Then A3 = (AiAi)~^ is real and positive; and since 
it has absolute valueone, A3 = 1. We see that there is always an eigenvalue 1. If i? is a 
rotation and not the identity, there is just one eigenvector with eigenvalue one; we denote 
this eigenvector by cr. We thus have Rcr — cr and ii 1 then cr is unique. The vector 
cr determines a one-dimensional subspace of R that is left invariant under the action of R. 
We call this one-dimensional invariant subspace the axis of rotation. 




Figure 1.1: Rotation around cr. Any axis is characterized by two angles (p and 9. 



Consider an arbitrary SO (3) element R with axis of rotation determined by cr over an 
angle ip and denote the rotation by R{eR, ip). Let 9 be the angle between the plane spanned 
by Cr and the 2;-axis and the xz-plane, and let ip be the angle between cr and the 2;-axis; 
cf. Figure 1.5. The rotation can be broken down into three rotations. First we use two 
rotations to go to a coordinate system with coordinates x', y' and z' in which the cr points 
in the 2;'-direction, and then we rotate around the z'-axis over an angle ■0- The two rotations 
to go to the new coordinate system are: (a) a rotation around the 2;-axis over an angle 9 to 
align the x-axis with the projection of cr onto the xy-plane, (b) a rotation over an angle if 
around the image of the y-axis under the first rotation. Hence we can write R{eR, ip) as a 
product 



R{eR,ip) 




cos (p sin (p 
1 




sm (p 



COS(p 




— sin 
cos'^ 
1 



In this way we obtain a system of coordinates on the manifold SO (3). The three angles are 
then called the Euler angles. 

We note in particular the following: The group 5*0(3) is generated by all elementary rota- 
tions Rx{q:), Ry{l3) and Rz{l) given by 




Rxioi) = j cos a —sin a 
sin a cos a 

cos 7 — sin 7 
Rz{l) — I sin 7 cos 7 



cos (3 sin [3 
Ry{/3) = I 10 
— sin /3 cos f3 











We now show that SU{2) is a real manifold that is isomorphic to the three sphere S^. We 
do this by finding an explicit parametrization of SU (3) in terms of two complex numbers 
X and y satisfying |a;p + ||/p = l, which defines the three-sphere. 
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We write an element g e SU{2) as 

9 



a b 
c d 



Writing out the equation g^g — 1 and det g — 1 one finds the following equations: 

|a|2 + |c|' = l, \b\^ + \d\^ = l, 
ab + cd — , ad — be — 1 . 

We first assume b — and find then that ad — 1 and cd — 0, implying that c — and U is 
diagonal with a — d. Next we suppose 6 7^ and use a = — y to deduce that |6| = |c| and 
\a\ — \d\; we thus have 6 7^ 44> c 7^ 0. We also see that we can use the ansatz 

a = e*" cos , b = e*'^ sin 9 , 
c = -e*^sin6', d = e'^cos9. 

Using again a ~ —~ we see a + S — P + "f and writing out ad — be = 1 we find a = —S and 
/3 = —7. We thus see a — d and 6 = — c. Hence the most general element of SU (2) can be 
written as 

9ix,y)=\ ^1, with |a;|^ + lyl^ = 1 . 



-y X 

The map — > SU (2) given by (x, y) 1— > ^'(a;, y) is clearly injective, and from the above 
analysis bijective. Furthermore the map is smooth. Hence we conclude that SU{2) = 
as a real manifold. 

The Pauli matrices (1.22) satisfy 
and 

tr((7^) = , tr((7V^) = 25'^ , tr([(7^ (7>*^) = Aie'^'' . 

Every vector p eM.^ we identify with the element p = a^pi + 0"^p2 + cr^Ps of su{2). We see 
that pxq corresponds to the element ^[p, q] and similarly we find 

^ ^ 1 ^ ^ ^ 1 

p-q=-tT{pq), p X q ■ f = — tT{\p,q]r) . 
2 Ai 

If g is an element of SU (2) and p e sxi(2) = we see that 

ti{gpg'^) = , (gpg'y = (g'^pg^ = gpg~^ , 
and we conclude that g induces a map ^ M^. Since 

tv{gpg'^gqg~^) = tT{pq) , 
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the map induced hj g ^ SU{2) defines an element of 0(3). But also pxq-fis invariant 
under the action of g and thus we found a map R : SU{2) — ^ 5*0(3), whereby g G SU{2) 
gets mapped to the element R{g) in 5*0(3) corresponding top gP9~^- The map g — >■ R{g) 
is a group homomorphism; that is, R{gig2) = R{gi)R{g2)- Explicitly we find 

g{x,y)a^g{x,y)~^ = Re{x^ — y^)a^ — lm{x'^ — y'^)a'^ + 2Re{xy)a'^ , 
g{x,y)a g[x,y) = lm{x + y )a + Re(x + y )a + 2lm{xy)a , 
g{x,y)a^g{x,y)~^ = -2Re{xy)a^ + 21m{xy)a^ + {\x\^ - \y\'^)a^ . 



Therefore 



Rc{x'^ — y"^) Im(a;^ + y"^) —2Rc{xy) \ 
R{g{x,y))= — Im(x^ — 1/^) Re{x'^ + y'^) 2lm{xy) 

2Re{xy) 21m{xy) \x\'^ — \y\'^ J 



We find 



i?(5i(cosQ;/2, — isinQ;/2)) = Rx{a) , 
i?(^(cos/3/2, -sin/3/2)) = i?,(/3) , 
i^(^(e-^/^0)) = i?,(7), 

and hence the map R : SU{2) — >• 5*0(3) is surjective. Suppose now that g{x,y) is mapped 
to the identity element in 5*0 (3). The kernel of R : SU{2) — > 5*0(3) is easily checked to 
consist of {±1}, which is the central Z2 subgroup of SU{2). (Easy exercise: Prove that Z2 
is the center of SU{2).) As any kernel of group homomorphisms, the kernel is a normal 
subgroup. All in all we have shown 

SO{2,)^ SU{2)/Z2. 



1.6 Symmetries 

Galilean spacetime. Before the start of the twentieth century, one thought that time for 
all observers was the same in the following sense: if two events take place at two different 
places in space, then the question whether the events took place at the same time has an 
observer independent answer. Space was thought of as a grid on which the motions of 
all objects took place and time was completely independent from space. The 'distance' 
between two events therefore consists of two numbers: a difference in time and a spatial 
distance. For example, the distance between when 1 woke up and when 1 took the subway 
to work is characterized by saying that from the moment 1 woke up it took me half an 
hour to reach the subway station, which is 500 meter from my bed. We call the spacetime 
described in this manner, the Galilean spacetime. 

There are three important kinds of symmetries in the Galilean spacetime and the group 
that these symmetries generate is called the Galilean symmetry group"^. If we shift the 



''The group is also called the Galilei group or the Galileo group. We follow the tradition which 
proceeds in analogy with the use of Euclidean metric or Hermitian matrices. 
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clock an hour globally, which is possible in Galilean spacetime, the laws of nature cannot 
alter. Hence one symmetry generator is the time-shift: t t + a for some fixed number 
a. Likewise, the laws of nature should not change if we shift the origin of our coordinate 
system; hence a second symmetry is the shift symmetry (x^, x^, x^) (a;^+6^, x^+b"^, x^+b^) 
for some fixed vector {b^,b'^,b^). The third kind of symmetries are rotations, that is, the 
group 5'0(3), which we have seen before. There are some additional discrete symmetries, 
like space refiection, where a vector {x^, x^, x^) is mapped to {—x^^ — x^, —x^). We however 
focus on the connected part of the Galilean symmetry group. The subgroup of the Galilean 
symmetrygroup obtained by discarding the time translations is the group ISO{?>). Below, 
when we discuss the Poincare group, we give more details on the group /5'0(3) as it is a 
subgroup of the Poincare group. 



Minkowski spacetime. With the advent of special relativity, the classical spacetime 
view altered in the sense that time and space made up one spacetime, called Minkowski 
spacetime. As a topological vector space Minkowski spacetime is nothing more than M"', 
but it is equipped with the Minkowski metric^ (also see Section 2.6 and Example 8.4.5): 

{x - yf = - yy + {x' - y'f + {x' - y'f + {x' - y'f = (x - y)^ - - y'f. 

The time component of the four-vectors is the zeroth component. We write a general 
four-vector as 

,1 



V 



^0 




where v is the space-hke part of v and is the time-like component of v. With the 
introduced notation we see that the Minkowski metric can be written as v"^ — — (f + v^, 
where is the usual Euclidean norm for three-vectors. The Minkowski inner product is 
derived from the Minkowski metric and given by 



w 



-v^w^ -\- vw . 



Note that in a strict sense the Minkowski inner product, the Minkowski norm and the 
Minkowski metric are not an inner product, norm and metric respectively as the positivity 
condition is clearly not satisfied. 

The Poinceire group is a subgroup of the group of all symmetries that leave the Minkowski 
metric invariant. The Poincare group is often denoted as ISO{?>, 1). On a four-vector v 

the Poincare group acts as v i— > Av -\- b, where A is an element of SO {3, 1) and b is some 
four-vector. Hence the Poincare group consists of rotations and translations. An explicit 
representation of I SO {3, 1) can be given in terms of the matrices 

A b 
1 



^We choose units such that c = 1, and work with the signature (— , +, +, +). 
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where A is an 4 x 4-matrix in SO{3, 1) and 6 is a four-vector. Recall that A is in SO{3, 1) 
if A satisfies 

A^rjA = rj , rj 



/ 


-1 















1 
















1 





V 











1/ 



The affine linear transformations contain the translations and 5*0(3, l)-rotations. The 
generators of the translations we call the momenta, and since they have four components, 
we sometimes refer to them as four-momenta. 

The (real) Lie algebra of IS0{3, 1) is described by the matrices of the form 

with A e so(3, 1) and b e R^. The Lie product is given by the commutator of matrices, 
and takes the form 

iA.b)n(A',b') = i[A,A'],Ab'-A'b), 
where Ab' is the usual matrix action of A on b'. In particular, we have 

(0, b) n (0, 6') = , (A, 0) n (0, b') = (0, Ab') , 

from which we read off that the translations form a commutative subalgebra. The trans- 
lations form an ideal such that the momenta form the standard representation of so(3, 1), 
that is, the defining representation. More general, the standard representation of so{p, q) 
is the one that defines so{p, q) and is thus given by {p + q) x {p + g)-matrices that leave 
a metric of signature (p, q) invariant; in Lie algebra theory the standard representation is 
called the fundamental representation. In the fundamental representation of so(3, 1) 
(which is not unitary), the Minkowski inner product is invariant. 

The group 5*0(3) is a subgroup of 5*0(3, 1) and consists of all those 5*0(3, l)-rotations that 
act trivially on the time-component of four-vectors. The Galilean symmetry group is the 
subgroup of ISO{?>, 1) consisting of the 5'0(3)-rotations together with the time translations. 

An element of 5*0(3, 1) is called a Lorentz boost if the element acts nontrivially on the 
zeroth component of four- vectors. By multiplying with an appropriate element of the 5*0(3) 
subgroup we may assume that a Lorentz boost only mixes the zeroth and first component 
of four- vectors. Then a Lorentz boost L takes the following form (recall c — 1): 



V 



and L(v)'^ = w^, L{v)^ ~ v^. Physically the Lorentz boost (1.24) describes how coordinates 
transform when one goes from one coordinate system to another coordinate system that 
moves with respect to the first system in the positive x^-direction with velocity v. Since v 
has to be smaller than one, as is apparent from (1.24), one concludes that special relativity 
excludes superluminal velocities. The number 

7=^i=^, (1-25) 



V 
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is called the 7-factor. The 7-factor gives an indication whether we should treat a physical 
situation with special relativity or whether a nonrelativistic treatment would suffice. The 
Lorentz contraction factor is the inverse of 7 and measures how distances shrink when 
measured in another coordinate system, moving at a velocity v with respect to the original 
coordinate system. For a-particles, with a typical speed of 15,000 kilometers per second, 
we have v = 0.05 and so 7 ~ 1.03 and 7"^ ~ 0.97, which implies that if we take a rod 
of 100 meter and let an ct-particle fly along the rod, it measures only 97m (assuming that 
a-particles can measure) . The 7-factor thus tells us that if we want accuracy of more than 
3%, we need to treat the a-particle relativistically. 



The nonrelativistic limit. In order to discuss the nonrelativistic limit, we restore the 
presence of the velocity of light c in the formulas. For a particle at rest, the space momentum 
p vanishes. The formula = —{mc)'^ therefore implies that, at rest, po — mc and the rest 
energy is seen to be = mc^. This suggests to define the kinetic energy (which vanishes 
at rest) by the formula 

H := poc — mc^. 
Introducing velocity v and speed v by 

V = p/m, V — \v\ — Vv^, 

we find from — pi = — (mc)^ that Pq = (mc)'^ + p^ = mci/ 1 + (v/c)^, so that 

1 + (v/c)^ — 1 mv^ 



,2 



' VI + {v/cf + 1 VI + {v/cf + 1 

Similarly, the energy becomes 



E = pqc 



Taking the limit c — > 00 we find that H becomes the kinetic energy |mv^ of a nonrelativistic 
particle of mass m. The nonrelativistic approximation H ^ |mf ^ for the kinetic energy 
is valid for small velocities v = \p/m\ <^ c, where we may neglect the term (v/c)^ in the 
square root of the denominator. 



General spacetime. The generalization of Minkowski spacetime is a manifold with a 
pseudo-Riemannian metric g that is locally like a Minkowski spacetime. Thus around 
every point there is a chart and a coordinate system such that g takes the form of a 
Minkowski metric. It is clear that a proper description of general relativity requires differ- 
ential geometry and the development of tensor calculus. 

In general relativity but also already in special relativity physicists use some conventions 
that are worth explaining. Spacetime indices indicating components of four-vectors are 
indicated by Greek letters /i, z^, . . .. To denote a four-vector x = {x^^) one writes simply x'^. 
If an index appears ones upstairs and once downstairs, it is to be summed over; this is called 
the Einstein convention. Derivatives are objects with indices downstairs; 9^ = d/dx'^. 
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The Kronecker delta 5^ is an invariant tensor and we have d^x'^ — 5^ and = 4. 

The Minkowski metrix is usually denoted by the Greek letter 1] and again one usually just 
writes 7]^^, to denote the metric and not just the /xiz-component; as a matrix the Minkowski 
metric is given by: 





-1 








o\ 







1 
















1 





v 











1/ 



The Minkowski inner product is now written as f ■ u) = v'^w^rj^y. If v and w are two 
elements of the tangent space at a point x their inner product is in general relativity given 
by v^w'^g^u{x), from which it is clear that general relativity is the curved generalization of 
special relativity. To denote the metric g physicists often describe a line element, which is 
to mean the distance of an infinitesimal displacement ds — {dx'^) ; 

ds^ — gny{x)dx'^dx^ . 

The metric g^y and its pointwise inverse g^^ are used to lower and to raise indices; indeed, 
as the metric gives an isomorphism between the tangent space and the cotangent space. 
Hence is defined as g^^d^,, and a check of consisteny gives g^^ = g^^g^''gpx- As a further 
exercize in the conventions the reader might verify g^^gxv — S^, g^^g^u — 4. The described 
conventions are used a lot in physics literature and more on its nature and why it works 
can be found in many text books on relativity, e.g., in the nice introductory textbook by 
dTnverno [62]). 

The symmetry group of a manifold M with a pseudo-Riemannian metric g is huge; it 
consists of all diffeomorphisms of the manifold, as any diffeomorphism preserves a metric. 
The vector fields on M describe the infinitesimal generators of the group of diffeomorphisms. 



The reason that one uses the integration measure 



u{k) = Wc2k2 + (1.26) 



(27r)32cu(A;)' ' ' V ft 

is due to Lorentz covariance. The integration measure is clearly rotation invariant. Hence 
to study the behavior under a general Lorentz transformation L we may assume that L 
only mixes the x-direction and the time-direction. In that case we have, using the 7-factor 
(1.25) 

L{kx) = = -f{kx - vkt) , 

^{ky) — ky , L[kz) — kz , 

kf — ^krr V 
L{kt) = = 7(fct - ^fc.) ■ 

1 ~7> 
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One easily checks that L{k)'^ — k"^. Since uj{k) is the zeroth component of the wave vector 



k, we see that the factors yl — ^ cancel out. Another way to see the covariance is to note 
the equality 



where 9 is the Heaviside function defined by 6{x) = 1 if a; > and 9{x) = if a; < 0. 
The Heaviside function selects the positive root of the equation (2.17). The equality of the 
two integration measures in (1.27) is proven by integration of both sides. It is clear that 
the expression (1.27) is invariant under Lorentz transformations. 



The goal of this book is to introduce the ideas relating quantum mechanics, Lie algebras and 
Lie groups, motivating everything as far as possible by classical mechanics. We shall mostly 
be concerned with systems with finite-dimensional phase spaces; the infinite-dimensional 
case is too difficult for a presentation at the level of this book. However, we present the 
concepts in such a way that they are valid even in infinite dimensions, and select the material 
so that it provides some insight into the infinite-dimensional case. 

Chapter 2 discusses systems of classical oscillators, starting with ordinary differential equa- 
tions modeling nonlincarly coupled, damped oscillators, and introducing some notions from 
classical mechanics - the Hamiltonian (energy), the notion of conservative and dissipative 
dynamics, and the notion of phase space. We then look in more detail into the single oscil- 
lator case, the classical anharmonic oscillator, and show that the phase space dynamics can 
be represented both in terms of Hamilton's equations, or in terms of Poisson brackets and 
the classical Heisenbcrg equation of motion. Since the Poisson bracket satisfies the Jacobi 
identity, this gives the first link to Lie algebras. Considering the special case of harmonic 
oscillators, we show that they naturally arise from eigenmodes of linear partial differential 
equations describing important classical fields: The Maxwell equations for beams of light 
and gamma rays, the Schrodinger equation and the Klein-Gordon equation for nonrela- 
tivistic and relativistic beams of alpha rays, respectively, and the Dirac equation for beams 
of beta rays. 

Chapter 3 relates the dynamics of arbitrary systems to those of oscillators by coupling the 
latter to the system, and exploring the resulting frequency spectrum. The observation that 
experimental spectra often have a pronounced discrete structure (analyzed in more detail 
in Chapter 11) is found to be explained by the fact that the discrete spectrum of a quan- 
tum Hamiltonian is directly related to the observed spectrum via the quantum Heisenberg 
equation of motion. Indeed, the observed spectral lines have frequencies corresponding to 
differences of eigenvalues of the Hamiltonian, multiplied by Planck's constant. This natu- 
rally explains the Rydberg-Ritz combination principle that had been established about 30 
years before the birth of modern quantum theory. An excursion into the early history of 
quantum mechanics paints a colorful picture of this exciting time when the modern world 





(1.27) 
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view got its nearly definite shape. We then discuss general properties of the spectrum of 
a system consisting of several particles, and how it reflects the bound state and scatter- 
ing structure of the multiparticle dynamics. Finally, we show how black body radiation, 
the phenomenon whose explanation by Planck initiated the quantum era, is related to the 
spectrum via elementary statistical mechanics. 

Part II discusses statistical mechanics from an algebraic perspective, concentrating on ther- 
mal equilibrium but discussing basic things in a more general framework. A treatment 
of equilibrium statistical mechanics and the kinematic part of nonequilibrium statistical 
mechanics is given. From a single basic assumption (Definition 6.1.1) the full structure of 
phenomenological thermodynamics and of statistical mechanics is derived, except for the 
third law which requires an additional quantization assumption. 

Chapter 4 gives a description of standard phenomenological equilibrium thermodynamics for 
single-phase systems in the absence of chemical reactions and electromagnetic fields. Section 
4.1 states the assumptions needed in an axiomatic way that allows an easy comparison with 
the statistical mechanics approach discussed in later chapters, and derives the basic formulas 
for the thermodynamic observables. Section 4.2 then discusses the three fundamental laws 
of thermodynamics; their implications are discussed in the remainder of the chapter. In 
particular, we derive the conventional formulas that express the thermodynamic observables 
in terms of the Helmholtz and Gibbs potentials and the associated extremal principles. 

Chapter 5 introduces the technical machinery of statistical mechanics, Gibbs states and the 
partition function, in a uniform way common to classical mechanics and quantum mechan- 
ics. Section 5.1 introduces the algebra of quantities and their basic realizations in classical 
and quantum mechanics. Section 5.2 defines Gibbs states, their partition functions, and 
the related KMS condition, and illustrates the concepts by means of the canonical ensemble 
and harmonic oscillators. The abstract properties of Gibbs states are studied in Section 
5.3, using the Kubo product and the Gibbs-Bogoliubov inequality. These are the basis of 
approximation methods, starting with mean field theory, and we indicate the connections. 
However, since approximation methods are treated abundantly in common texts, we dis- 
cuss elsewhere in the present book only exact results. The final Section 5.4 discusses limit 
resolutions for the values of quantities, and the associated uncertainty relations. 

Chapter 6 rederives the laws of thermodynamics from statistical mechanics, thus putting the 
phenomenological discussion of Chapter 4 on more basic foundations. Section 6.1 defines 
thermal states and discusses their relevance for global, local, and microlocal equilibrium. 
Section 6.2 deduces the existence of an equation of state and connects the results to the 
phenomenological exposition in Section 4.1. Section 6.3 proves the first law of thermody- 
namics. In Section 6.4, we compare thermal states with arbitrary Gibbs states and deduce 
the extremal principles of the second law. Section 6.5 shows that the third law is related 
to a simple quantization condition for the entropy and relates it to the time-independent 
Schrodinger equation. 

In Chapter 7 we discuss in more detail the relation between mathematical models of phys- 
ical systems and reality. Through a discussion of the meaning of uncertainty, statistics, 
and probability, the abstract setting introduced in the previous chapters is given both a 
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deterministic and a statistical interpretation. Section 7.1 discusses questions relating to dif- 
ferent thermal models constructed on the basis of the same Euclidean *-algebra by selecting 
different lists of extensive quantities. Section 7.2 discusses the hierarchy of equilibrium de- 
scriptions and how they relate to each other. Section 7.3 reviews the role of statistics in 
the algebraic approach to statistical mechanics. Section 7.4 gives an operational meaning 
to classical instruments for measuring the value of uncertain quantities, and to statistical 
instruments whose measurement results are only statistically correlated to properties of the 
measured system. Section 7.5 extends the discussion to quantum systems, and discusses the 
deterministic and statistical features of quantum mechanics. Section 7.6 relates the subject 
to information theory, and recovers the usual interpretation of the value of the entropy as 
a measure of unobservable internal complexity (lack of information). The final Section 7.7 
discusses the extent to which an interpretation in terms of subjective probability makes 
sense, and clarifies the relations with the maximum entropy principle. 

Part III introduces the basics about Lie algebras and Lie groups, with an emphasis on the 
concepts most relevant to the conceptual side of physics. 

Chapter 8 introduces Lie algebras. We introduce in Section 8.1 the basic definitions and 
tools for verifying the Jacobi identity, and establish the latter for the Poisson bracket of 
a single harmonic oscillator and in Section 8.2 for algebras of derivations in associative 
algebras. Noncommutative associative algebras give rise to Lie algebras in a different way - 
via commutators, discussed in Section 8.3. The fact that linear operators on a vector space 
form a Lie algebra brings quantum mechanics into the picture. Differential equations in 
associative algebras defining exponentials naturally produce Lie groups and the exponential 
map, which relates Lie groups and Lie algebras. In Section 8.4, we discuss classical groups 
and their Lie algebras. Taking as the vector space the space of n-dimensional column vec- 
tors gives as basic examples the Lie algebra oi n x n matrices and its most important Lie 
subalgebras, the orthogonal, symplectic, and unitary Lie algebras. Many finite-dimensional 
Lie groups arise as groups of square invertible matrices, and we discuss the most important 
families, in particular the unitary, orthogonal, and symplectic groups. We then discuss 
Heisenberg algebras and Heisenberg groups and their relation to the Poisson bracket of 
harmonic oscillators via the canonical commutation relations. The product law in Heisen- 
berg groups is given by the famous Weyl relations, which are an exactly representable case 
of the Baker-Campbell-Hausdorff formula valid for many other Lie groups, in particular 
for arbitrary finite-dimensional ones. We end the Chapter with a treatment of the slightly 
richer structure of a Lie *-algebra usually encountered in the mechanical applications. In 
traditional terms. Lie *-algebras are equivalent to complexifications of real Lie algebras, 
but the *-formulation is oftenmore suitable for discussing physics. 

Chapter 9 brings more physics into play by introducing Poisson algebras, the algebras in 
which it is possible to define Hamiltonian mechanics. Poisson algebras abstract the alge- 
braic features of both Poisson brackets and commutators, and hence serve as a unifying tool 
relating classical and quantum mechanics. After defining the basic concepts in Section 9.1, 
we discuss in Section 9.2 rotating rigid bodies, in Section 9.3 the concept of angular mo- 
mentum, and the commutative Poisson algebra of smooth functions of angular momentum. 
It is directly related to the group SO (3) of 3-dimensional rotations and the corresponding 
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Lie algebra so(3) of infinitesimal rotations, which is generated by the components of the 
angular momentum. In particular, we obtain in Section 9.4 the Euler equations for the clas- 
sical spinning top from a Hamiltonian quadratic in the angular momentum. This example 
shows how the quantities of a classical Poisson algebra are naturally interpreted as physical 
observables. The angular momentum Poisson algebra is a simple instance of Lie-Poisson al- 
gebras, a class of commutative Poisson algebras canonically associated with any Lie algebra 
and constructed in Section 9.5. The Poisson bracket for the harmonic oscillator is another 
instance, arising in this way from the Heisenberg algebra. Thus Hamiltonian mechanics 
on Lie-Poisson algebras generalizes the classical anharmonic oscillator, and gives for so (3) 
the dynamics of spinning rigid bodies. Sections on classical symplectic mechanics and its 
application to the dynamics of molecules and an outlook to quantum field theory conclude 
the chapter. 

Chapter 10 introduces representations of Lie algebras and Lie groups in associative alge- 
bras and in Poisson algebras. A general physical system can be characterized in terms of 
a Poisson representation of the kinematical Lie algebra of distinguished quantities of in- 
terest, a Hamiltonian, a distinguished Hermitian quantity in the Poisson algebra defining 
the dynamics, and a state defining a particular system at a particular time. We also in- 
troduce Lie algebra and Lie group representations in associative algebras, which relate Lie 
algebras and Lie groups of matrices or linear operators to abstract Lie algebras and Lie 
groups. These linear representations turn out to be most important for understanding the 
spectrum of quantum systems, as discussed later in Section 17.6. We then discuss unitary 
representations of the Poincare group, the basis for relativistic quantum field theory. An 
overview over semisimple Lie algebras and their classification concludes the chapter. 

Part IV gives an introduction to differential geometry from an algebraic perspective. 

Chapter 11 starts with an introduction to basic concepts of differential geometry. We 
define (smooth, infinitely often differentiable) manifolds and the associatated algebra of 
scalar fields. Its derivations define vector fields, which form important examples of Lie 
algebras. The exterior calculus on alternating forms is developped. Finally, Lie groups are 
interpreted as manifolds. 

Chapter 12 discusses the construction of Poisson algebras related to manifolds, and associ- 
ated Poisson manifolds, the arena for the most general classical dynamics. We show how 
classical symplectic mechanics (in fiat phase space) and constrained Hamiltonian mechanics 
fit into the general abstract picture. We end the chapter with a discussion of the Lagrangian 
approach to classical mechanics. 

Chapter 13 is about Hamiltonian quantum mechanics. We discuss a classical symplec- 
tic framework for the Schrodinger equation. This is then generalized to a framework for 
quantum-classical dynamics, including important models such as the Born-Oppenheimer 
approximation for the quantum motion of molecules. A section on deformation quanti- 
zation relates classical and quantum descriptions of a system, and the Wigner transform 
makes the connection quantitiatively useful in the special case of a canonical system with 
finitely many degrees of freedom. 
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The final Part V applies these concepts to the study of the dominant kinds of elementary 
motion in a bound system, vibrations (described by oscillators, Poisson representations of 
the Heisenberg group), rotations (described by a spinning top, Poisson representations of 
the rotation group), and their interaction. On the quantum level, quantum oscillators are 
always bosonic systems, while spinning systems may be bosonic or fermionic depending on 
whether or not the spin is integral. The analysis of experimental spectra, concentrating on 
the mathematical contents of the subject, concludes our discussion. 

Chapter 14 is a study of harmonic oscillators (bosons, elementary vibrations), both from 
the classical and the quantum point of view. We introduce raising and lowering operators 
in the symplectic Poisson algebra, and show that the classical case is the limit h ^ of 
the quantum harmonic oscillator. The representation theory of the single-mode Heisenberg 
algebra is particularly simple since by the Stone-von Neumann theorem, all unitary repre- 
sentations are equivalent. We find that the quantum spectrum of a harmonic oscillator is 
discrete and consists of the classical frequency (multiplied by h) and its nonnegative integral 
multiples (overtones, excited states). For discussing the representation where the harmonic 
oscillator Hamiltonian is diagonal, we introduce Dirac's bra-ket notation, and deduce the 
basic properties of the bosonic Fock spaces, first for a single harmonic oscillator and then 
for a system of finitely many harmonic modes. We then introduce coherent states, an over- 
complete basis representation in which not only the Heisenberg algebra, but the action of 
the Heisenberg group is explicitly visible. The coherent state representation is particularly 
relevant for the study of quantum optics, but we only indicate its connection to the modes 
of the electromagnetic field. 

Chapter 15 discusses spinning systems, again from the classical and the quantum per- 
spective. Starting with the Lie-Poisson algebra for the rotation group and a Hamiltonian 
quadratic in the angular momentum, we obtain the Euler equations for the classical spin- 
ning top. The quantum version can be obtained by looking for canonical anticommutation 
relations, which naturally produce the Lie algebra of a spinning top. As for oscillators, 
the canonical anticommutation relations have a unique irreducible unitary representation, 
which corresponds to a spin 1/2 representation of the rotation group. The multimode 
version gives rise to fermionic Fock spaces; in contrast to the bosonic case, these are finite- 
dimensional when the number of modes is finite. In particular, the single mode fermionic 
Fock space is 2-dimensional. Many constructions for bosons and fermions only differ in 
the signs of certain terms, such as commutators versus anticommutators. For example, 
quadratic expressions in bosonic or fermionic Fock spaces form Lie algebras, which give 
natural representations of the universal covering groups of the Lie algebras so{n) in the 
fermionic case and sp{2n, M) in the bosonic case, the so-called spin groups and metaplectic 
groups, respectively. In fact, the analogies apart from sign lead to a common generalization 
of bosonic and fermionic objects in form of super Lie algebras, which are, however, outside 
the scope of the book. Apart from the Fock representation, the rotation group has a unique 
irreducible unitary representation of each finite dimension. We derive these spinor repre- 
sentations by restriction of corresponding nonunitary representations of the general linear 
group GL{2, C) on homogeneous polynomials in two variables, and find corresponding spin 
coherent states. 
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Chapter 16 discusses highest weight representations, providing tools for classifying many 
irreducible representations of interest. The basic ingredient is a triangular decomposition, 
which exists for all finite-dimensional semisimple Lie algebras, but also in other cases of 
interest such as the oscillator algebra, the Heisenberg algebra with the harmonic oscillator 
Hamiltonian adjoined. We look at detail at 4-dimensional Lie algebras with a nontrivial 
triangular decomposition (among them the oscillator algebra and so(3)), which behave 
almost like the oscillator algebra. As a result, the analysis leading to Fock spaces generalizes 
without problems, and we are able to classify all irreducible unitary representations of the 
rotation group. 

The final Chapter 17 applies the Lie theoretic structure to the analysis of quantum spectra. 
After a short history of some aspects of spectroscopy, we look at the spectrum of bound 
systems of particles. We show how to obtain from a measured spectrum the spectrum of the 
associated Hamiltonian, and discuss qualitative results on vibrations (giving discrete spec- 
tra) and chemical reactions (giving continuous spectra) that come from the consideration of 
simple systems and the consideration of approximate symmetries. The latter are shown to 
result in a clustering of spectral values. The structure of the clusters is determined by how 
the irreducible representations of a dynamical Lie algebra split when the algebra is reduced 
to a subalgebra of generating symmetries. The clustering can also occur in a hierarchical 
fashion with fine splitting and hyperfine splitting, corresponding to a chain of subgroups. 
As an example, we discuss the spectrum of the hydrogen atom. 
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Chapter 2 



Classical oscillating systems 



This chapter starts the mathematical part of the book. We discuss in detail an important 
classical physical system, the harmonic or anharmonic oscillator, and their multivariate 
generalization, which describe systems of coupled oscillators such as macromolecules or 
planetary systems. 

Understanding classical oscillators is of great importance in understanding many other 
physical systems. The reason is that an arbitrary classical system behaves close to equi- 
librium like a system of coupled linear oscillators. The equations we deduce are therefore 
approximately valid in many other systems. For example, a nearly rigid mechanical struc- 
ture such as a high-rise building always remains close enough to equilibrium so that it can 
be approximately treated as a linear oscillating system for the elements into which it is 
decomposed for computational purposes via the finite element method. 

We shall see that the equations of motion of coupled oscillators can be cast in a form that 
suggest a Lie algebra structure behind the formalism. This will provide the connection to 
Part III of the book, where Lie algebras are in the center of our attention. 

Besides the (an-)harmonic oscillators we discuss some basic linear partial differential equa- 
tions of physics: the Maxwell equations describing (among others) hght and gamma rays, 
the Schrodinger equation and the Klein-Gordon equation (describing alpha rays), and the 
Dirac equation (describing beta rays). The solutions of these equations can be represented 
in terms of infinitely many harmonic oscillators, whose quantization (not treated in this 
book) leads to quantum field theory. 



2.1 Systems of damped oscillators 

For any quantity x depending on time differentiation with respect to time is denoted by 
x: 

d 

X — —X . 



33 



34 



CHAPTER 2. CLASSICAL OSCILLATING SYSTEMS 



Analogously, n dots over a quantity represents differentiating this quantity n times with 
respect to time. 

The configuration space is the space of possible positions that a physical system may 
attain, including external constraints. For the moment, we think of it as a subset in M**. A 
point in configuration space is generally denoted q. For example, for a system of point 
masses, q is an A^-tuple of vectors G arranged below each other; each denotes the 
spatial coordinates of the kth moving point (planet, atom, node in a triangulation of the 
body of a car or building, etc.), so that n — 3N. 

A system of damped oscillators is defined by the differential equation 



The reader wishing to see simple examples should turn to Section 2.2; here we explain the 
contents of equation (2.1) in general terms. As before, q is the configuration space point 
g e R". The M and C are real n x n-matrices, called the mass matrix and the friction 
matrix, respectively. The mass matrix is always symmetric and positive definite (and often 
diagonal, the diagonal entries being the masses of the components). The friction matrix 
need not be symmetric but is always positive semidefinite. The potential V" is a smooth 
function from R" to R, i.e., V e C~(R",R), and W is the gradient of V, 



Here the gradient operator V is considered as a vector whose components are the dif- 
ferential operators d/dqu- In finite-element applications in structural mechanics, the mass 
matrix is created by the discretization procedure for a corresponding partial differential 
equation. In general, the mass may here be distributed by the discretization over all adja- 
cent degrees of freedom. However, in many applications the mass matrix is diagonal; 



where rrii is the mass corresponding to the coordinate qi, and 5ij is the Kronecker symbol 
(or Kronecker delta), which is 1 if i = j and zero otherwise. In the example where qfc is 
a three- vector denoting the position of an object, then i is a multi-index i = {k,j) where 
k denotes an object index and j = 1, 2, 3 is the index of the coordinate of the kth object 
which sits in position i of the vector q. Then rrii is the mass of the kth object. 

The quantity F defined by 



is the force on the system at the point q due to the potential V{q). We define the velocity 
V of the oscillating system by 

V := q . 

The Hamiltonian energy H is then defined by 



Mq + Cq + VV{q) = . 



(2.1) 




Mij = rriiSij , 



F{q) := -VV{q) 




(2.2) 
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The first term on the right-hand side is called the kinetic energy since it depends solely on 
the velocity of the system. The second term on the right-hand side is called the potential 
energy and it depends on the position of the system. For more complex systems the 
potential energy can also depend on the velocities. Calculating the time-derivative of the 
Hamiltonian energy H we get 

H = v'^Mv + VV{q) ■ q = f{Mq + W{q)) = -fCq < , (2.3) 

where the last equality follows from the differential equation (2.1) and the final inequality 
follows since C is assumed to be positive semidefinitc. If C = (the idealized case of no 
friction) then the Hamiltonian energy is constant, H = 0, and in this case we speak of 
conservative dynamics (the Hamiltonian H is conserved). If C is positive definite we 
have H < unless q — and there is energy loss. This is called dissipative dynamics. 
In the dissipative case, the sum of the kinetic and potential energy has to decrease. 

If the potential V is unbounded from below, it might happen that the system starts falling 
in a direction in which the potential is unbounded from below and the system becomes 
unphysical; the velocity could increase without limits. Thus, in a realistic and manage- 
able physical system, the potential is always bounded from below, and we shall make this 
assumption throughout. It follows that the Hamiltonian is bounded from below. 

Since in the dissipative case the Hamiltonian energy is decreasing and is bounded below, 
it will approach a limit as t ^ oo. Therefore, H ^ 0, and by (2.3), q^Cq — > 0. Since 
C is positive definite for a dissipative system, this forces q ^ 0. Thus, the velocities will 
get smaller and smaller, and asymptotically the system will approach the configuration of 
being in a state with g 0, at the level of the accuracy of the model. Typically this imphes 
that q tends to some constant value qo- Note that it does not follow rigorously that q tends 
to a constant value; it is possible that q oo. Nevertheless we assume that q does not 
walk away to infinity and then it follows from g = that q = 0, so that (2.1) implies 
V^(go) = 0, and we conclude that q tends to a stationary point qo of the potential. If 
this is a saddle point, small perturbations can (and will) cause the system to move towards 
another stationary point. Because of such stability reasons, the system ultimately moves 
towards a local minimum. 

In practice, the perturbations come from imperfections in the model. Remember that the 
deterministic equation (2.1) is a mathematical idealization of the real world situation. A 
more appropriate model (but still an approximation) is the equation 

Mq + Cq + WV{q)^e, 

where e is a stochastic force, describing the imperfections of the model. Typically, these 
are already sufficient to guarantee with probability 1 that the system will not end up 
in a saddle point. Usually, imperfections are small, irregular jumps due to friction, see, 
e.g., BOWDEN & Leben [44], or Brownian motion due to kicks by molecules of a solvent. 
See, e.g.. Brown [47], Einstein & Brown [69], Garcia & Palacios [87], Hanggi & 
Marchesoni [108], for an overview on Brownian motion with lots of historical references 
and citations Duplantier [65], and for a discussion in the context of protein folding 
Neumaier [182, Section 4]. 
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In many cases, the potential V{q) has several local minima. Our argument so far says 
that the state of the system will usually move towards one of these local minima. Around 
the local minimum it can oscillate for a while, and in the absence of stochastic forces it 
will ultimately get into one of the local minima. If we assume that there are stochastic 
imperfections, we can say even more! 

Suppose that the local minimum where the system tends to is not a global minimum, 
then occasional stochastic perturbations may suffice to push (or kick) the system over a 
barrier separating the local minimum from a valley leading to a different minimum. Such 
a barrier is characterized by a saddle point, a stationary point where the Hessian of 
the potential has exactly one negative eigenvalue. The energy needed to pass the barrier, 
called the activation energy, is simply the difference between the potential energy of the 
separating saddle point and the potential energy of the minimum. In a simple, frequently 
used approximation, the negative logarithm of the probability of exceeding the activation 
energy in a given time span is proportional to the activation energy. This implies that 
small barriers are easy to cross, while high barriers are difficult to cross. In particular, if a 
system can cross a barrier between a high-lying minimum to a much lower lying minimum, 
it is much more likely to cross it in the direction of the lower minimum than in the other 
direction. This means that (averaged over a polulation of many similar systems) most 
systems will spend most of their time near low minima, and if the energy barriers between 
the different minima are not too high, most systems will be most of the time close to the 
global minimum. Thus a global minimum characterizes an absolutely stable equilibrium, 
while other local minima are only metcistable equilibrium positions, which can become 
unstable under sufficiently large stochastic perturbations. 

There are famous relations called fluctuation-dissipation theorems that assert (in a 
quantitative way) that friction is related to stochastic (i.e., not directly modeled high fre- 
quency) interactions with the environment. In particular, if a system is sufficiently well 
isolated, both friction and stochastic forces become negligible, and the system can be de- 
scribed as a conservative system. Of course, from a fundamental point of view, the only 
truly isolated system is the universe as a whole, since at least electromagnetic radiation 
escapes from all systems not enclosed in an opaque container, and systems confined to a 
container interact quite strongly with the walls of the container (or else the wall would not 
be able to confine the system) . 

Thus on a fundamental level, a conservative system describes the whole universe from the 
tiniest microscopic details to the largest cosmological facts. Such a system would have to 
be described by a quantum field theory that combines the successful standard model of 
particle physics with general relativity. At present, no such theory is available. 

On the other hand, conservative systems form a good first approximation to many small and 
practically relevant systems, which justifies that we spend most of the book with looking 
at the conservative case only. 

The phase space formulation. So far, our discussion was framed in terms of position 
and velocity. As we shall see, the Hamiltonian description is most powerful in phase space 
coordinates. Here everything is expressed in terms of the phase space observables q and p, 
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where 

p := Mv , 

is called the momentum p of the oscillating system. The phase space for a system of 
oscillators is the space of points {q,p) G M" x R". A state (in the classical sense) is a point 
{p, q) in phase space. The Hamiltonian function (or simply the Hamiltonian) is the 
function defining the Hamiltonian energy in terms of the phase space observables p and q. 
In our case, since a positive definite matrix is always invertible, we can express v in terms 
olp diS V — M'^p, and find that 

H{p,q)^]^p^M-^p + V{q). (2.4) 

Note that H does not depend explictly on time. (In this book, we only treat such cases; 
but in problems with time-dependent external fields, an explicit time dependence would be 
unavoidable.) 



2.2 The classical anharmonic oscillator 

To keep things simple, we concentrate on the case of a single degree of freedom. Everything 
said has a corresponding generalization to systems of coupled oscillators, but the essentials 
are easier to see in the simplest case. 

The simple anhcirmonic oscillator is obtained by taking n — 1. The differential equation 
(2.1) reduces to a scalar equation 

mq + cq + V\q)=Q, (2.5) 

where the prime denotes differentiation with respect q. This describes for example the 
behavior of an object attached to a spring; then q is the length of the spring, m = M is 
the mass of the object, c = C is the friction constant (collective of the air, some friction 
in the spring itself, or of a surface if the object is lying on a surface) and V{q) describes 
the potential energy (see below) the spring has when extended or contracted to length q. 
Note that a constant shift in the potential does not alter the equations of motion of an 
anharmonic oscillator; hence the potential is determined only up to a constant shift. 

The harmonic oscillator is the special case of the anharmonic oscillator defined by a 
potential of the form 

V{q)^\{q-qo)\ k>0, 

where qo is the equilibrium position of the spring. (Strictly speaking, only oscillators that 
are not harmonic should be called anharmonic, but we follow the mathemaitcal practice 
where limiting cases are taken to be special cases of the generic concept: A linear function 
is also nonlinear, and a real number is also complex.) In this case, the force becomes 



F{q) = -VV{q) = -k{q - qo) 



(2.6) 
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The equation (2.6) is sometimes called Hooke's law, which asserts that the force needed 
to pull a spring from equilibrium is linear in the deviation q — qo from equilibrium, a valid 
approximation when q — qg is small. Since the force is minus the gradient of the potential, 
the potential has to be quadratic to reproduce Hookes' law. It is customary to shift the 
potential such that it vanishes in global equilibrium; then one gets the above form, and 
stability of the equilibrium position dictates the sign of k. Note that the shift does not 
change the force, hence has no physical effect. 

The mathematical pendulum is described by the equation 

V{q) = -k{l-cosq), k>0, (2.7) 

where q is now the angle of deviation from the equilibrium, measured in radians. Looking 
at small q we can approximate as follows: 

V{q)^'^q' + 0{q'), 

and after dropping the error term, wc end up with a harmonic oscillator. The same argument 
allows one to approximate an arbitrary anharmonic oscillator by a harmonic oscillator as 
long as the oscillations around a stable equilibrium position are small enough. 

For q not small the mathematical pendulum is far from harmonic harmonic. Physically this 
is clear; stretching a (good) spring further and further is harder and harder, but pushing 
the one-dimensional pendulum 'far' from its equilibrium position is really different. After 
rotating it over tt radians the pendulum is upside down and pushing it further no longer 
costs energy. 



Dynamics in phcise space. We now restrict to conservative systems and analyze the 
conservative anharmonic oscillator (H = 0) a bit more. Since c — 0, the differential 
equation (2.5) simplifies to 



mq + V\q)^0. (2.8) 

The Hamiltonian energy is given by 

H ^^mv'^ + V{q). 

Note the form of the kinetic energy familiar from school. Expressed in terms of the phase 
space observables p and q (which are now scalar variables, not vectors), we have 

q — V — — , p — mi! — mq — —V'(q) . (2.9) 

m 

An observable is something you can calculate from the state; simple examples are the 
velocity and the kinetic, potential, or Hamiltonian energy. Thus arbitrary observables 
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can be written as smooth functions f{p, q) of the phase space observables. In precise 
terms, an observable is (for an anharmonic oscillator) a function / G C°°(]R x M). The 
required amount of smoothness can be reduced in practical applications; on the fundamental 
theoretical level, it pays to require infinite differentiability to get rid of troubling exceptions. 

Introducing the shorthand notation 

^ dp ^ ^ 9g ' 

for partial derivatives, we can write the equations (2.9) in the form 

q = Hp, p=-H,. (2.10) 

The equations (2.10) are called the Hamilton equations in state form. Although derived 
here only for the anharmonic oscillator, the Hamilton equations are of great generality; 
the equations of motions of many (unconstrained) conservative physical systems can be 
cast in this form, with more complicated objects in place of p and q, and more complex 
Hamiltonians H{p,q). A dynamical system governed by the Hamilton equations is called 
an isolated Hamiltonian system. If there are external forces, the system is not truly 
isolated, but the Hamilton equations are still valid in many cases, provided one allows the 
Hamiltonian to depend explicitly on time, H = H[p,q,t); in this case, there would appear 
additional partial derivatives with respect to time in various of our formulas. 

Calculating the time-dependence of an arbitrary observable / we get 
hence 

f^HJ,-HJ,. (2.11) 

In particular ior f — q or f — p we recover (2.10). Thus this formulation is equivalent to 
the Hamilton equations. We call (2.11) the Hamilton equations in general form. Let us 
apply (2.11) to f — H and calculate the change of the Hamiltonian: 

H — HpHq — HqHp — , 

which is consistent since we knew from the start that energy is conserved, H = 0. But now 
this relation has been derived for arbitrary isolated Hamiltonian systems! 

When external forces are present, we have to consider time-dependent observables f{p, q, t). 
In this case, we have in place of (2.11) 

f-fpP + M + fti = - HJp + ft , (2.12) 
and in particular for the Hamiltonian, 

H^Ht^ dH/dt. 



However, we restrict the subsequent discussion to the isolated case. 
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The Hamiltonian equations can be cast in a form that turns out to be even more general 
and very useful. It brings us directly to the heart of the subject of the present book. We 
define a binary operation ~i on C°°(M x M) as follows: 

f^9-^ fp9q - gpfq ■ 

Physicists write {g, /} ior f ~i g and call it the Poisson bracket. Our alternative notation 
will turn out to be very useful, and generalizes in many unexpected ways. The equation 
(2.11) can then be written in form of a classical Heisenberg equation 

f = H^f. (2.13) 

It turns out that this equation, appropriately interpreted, is extremely general. It covers 
virtually all conservative systems of classical and quantum mechanics. 

A basic and most remarkable fact, which we shall make precise in the following chapter, is 
that the vector space C°°(]R x M) equipped with the binary operation n is a Lie algebra. 
We shall take this up systematically in Section 9.1. 



2.3 Harmonic oscillators and linear field equations 

Historically, radiating substances which produce rays of a-, (5- or 7-particles were fundamen- 
tal for gaining an understanding of the structure of matter. Even today, many experiments 
in physics are performed by rays (or beams, which is essentially the same) generated by 
some source and then manipulated in the experiments. 

The oldest, most familiar rays are light rays, a-rays, /3-rays, and 7-rays. (Nowadays, we 
also have neutron rays, etc., and cosmic rays contain all sorts of particles.) 

All kinds of rays are described by certain quantum fields, obtained by quantizing corre- 
sponding classical field equations, linear partial differential equations whose time-periodic 
solutions provide the possible single-particle modes of the quantum fields. In the fol- 
lowing sections we look at these field equations in some detail; here we just make some 
introductory comments. 

a-rays are modes (realizations) of the field of doubly ionized helium, He^~^, which is mod- 
eled on the classical level by a Schrodinger wave equation or a Klein-Gordon wave equation. 
/3-rays are modes of a charged field of electrons or positrons, modeled on the classical level 
by a Dirac wave equation. For radiation of only positrons one uses the notation (3~^, and 
for rays with only electrons one uses f3^ . Both light rays and 7-rays are modes of the 
electromagnetic field which are modeled on the classical level by the Maxwell wave equa- 
tions. Their quantization (which we do not treat in this book) produces the corresponding 
quantum fields. 

In the present context, the Schrodinger, Klein-Gordon, Dirac, and Maxwell equations are 
all regarded as classical field equations for waves in 3 -|- 1 dimensions, though they can also 
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be regarded as the equations for a single quantum particle (a nonrelativistic or relativistic 
scalar particle, an electron, or a photon, respectively). This dual use is responsible for calling 
second quantization the quantum field theory, the quantum version of the classical theory 
of these equations. It also accounts for the peirticle-wave duality, the puzzhng property 
that rays sometimes (e.g., in photodetection or a Geiger counter) behave like a beam of 
particles, sometimes (in diffraction experiments, of which the double slit experiments are 
the most famous ones) like a wave - in the case of light a century-old conflict dating back 
to the times of Newton and Huygens. 

In the quantum field setting, quantum particles arise as eigenstates of an operator called 
the number operator. This operator has a discrete spectrum with nonnegative integer 
eigenvalues, counting the number of particles in an eigenstate. The ground state, with 
zero eigenvalue, is essentially unique, and defines the vacuum; a quantum particle has 
an eigenstate corresponding to the eigenvalue 1 of N, and eigenstates with eigenvalue n 
correspond to systems of n particles. If a quantum system contains particles of different 
types, each particle type has its own number operator. 

The states that are easy to prepare correspond to beams. The fact that beams have a 
fairly well-defined direction translates into the formal fact that beams are approximate 
eigenstates of the momentum operator. Indeed, often beams are well approximated by 
exact eigenstates of the momentum operator, which describe so-called monochromatic 
beams. (Real beams are at best quasi-monochromatic, a term we shall not explain.) Since 
the states of beams are not eigenstates of the number operator N, they contain an indefinite 
number of particles. 

All equations mentioned are linear partial differential equations, and behave just like a set 
of infinitely many coupled harmonic oscillators, one at each space position. They describe 
non-interacting fields in a homogeneous medium. The definition of interacting fields leaves 
the hnear regime and leads into the heart of nonlinear field theory, both in a classical and 
a quantum version. This is outside the scope of the present book. However, when position 
space (or momentum space) is discretized so that only a finite number of degrees of freedon 
remain to describe a field, one is back to nonlinear oscillators, which can be understood 
completely on the basis of the treatment given here, and indeed, number operator will play 
a prominent role in Part V of this book. 

Fortunately, for understanding beam experiments, it usually suffices to quantize a few 
modes of the classical field, and these are harmonic oscillators. Indeed, by separation 
of variables, the linear field equations can be decoupled in time, leading to a system of 
uncoupled harmonic oscillator forming the Fourier modes. Beams correspond to solutions 
which have a significant intensity only in a small neighborhood of a line in 3-space. (Quasi- 
)monochromatic beams are the beams corresponding to solutions which have an (almost) 
constant frequency. Thus interactions with beams can often be modelled as interactions 
with a few harmonic oscillators. 

On the other hand, when a beam containing all frequencies interacts with a system which 
oscillates only with certain frequencies, the beam will resonate with these frequencies. This 
allows the detection of a system's eigenfrequencies by observing its interaction with light 
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or other radiation. This is the basis of spectroscopy, and will be discussed in more detail 
in Chapter 3 and Chapter 17. 

Note that, just as the free Maxwell equations describe both classical electromagnetic waves 
(in particular light and 7-rays) or single photons (particles of the corresponding quantum 
field) , so the Schrodinger equation, the Klein-Gordon equation, and the Dirac equation de- 
scribe both classical fields for a- and /3-rays, or single a-particles or electrons and positrons 
(particles of the corresponding quantum field), respectively. 

In the following, we consider four kinds of classical fields and their associated quantum 
particles, differing in spin and hence in the way rotations and Lorentz transformations 
affect the fields. 

• Slow bosons of spin zero, such as slow a-particles. The equation describing them is 
the Schrodinger equation. 

• Fast bosons of spin zero, such as fast a-particles. The equation describing them is 
the Klein-Gordon equation. 

• Fermionic particles of spin 1/2, like electrons, positrons and neutrinos. The dynamical 
equation in this case is the Dirac equation. 

• Light and 7-rays; electromagnetic radiation. The corresponding particles are photons, 
which have spin 1. The field describing these particles is the electromagnetic field. 
The equations governing their dynamics are the Maxwell equations. 

Because of the differing spin, there is a significant difference between /3-rays and the oth- 
ers: a-particles and photons have integral spin and are therefore so-called bosons, while 
electrons and positrons have non-integral spin and arc therefore so-called fermions. Only 
fermions and subject to the so-called Pauli exclusion principle which is responsible for 
the extensivity of matter. This difference is reflected by the fundamental requirement that 
the fields of bosons, in particular of a-particles and photons, are quantized by imposing 
canonical commutation relations (discussed in Section 14.2), while fermions, and hence 
positrons and electrons, are quantized by imposing canonical anticommutation rela- 
tions (discussed in Section 15.1). 



2.4 Alpha rays 

We first consider rays consisting of a-peirticles, helium atoms stripped of their two elec- 
trons, and consist of two protons and two neutrons, a-particles are released by other 
heavier nuclei during certain processes in the nucleus. For example, some elements are 
a-radioactive, which means that a nucleus of type A will want to go to a lower energy 
level, which can then be done by emitting two of its protons and two of its neutrons. The 
result is thus two nuclei, a Helium nucleus and a nucleus of type A' 7^ A; schematically 
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A ^ A' + a. But also during nuclear splitting a-particles arc released. Yet even more, the 
sun is emitting a-particles all the time; the sun produces heat by means of a chain of nuclear 
fusion reactions, during which some a-particles are produced. If the atmosphere would not 
be there, hfe on earth would be impossible due to the bombardment of a-particles. That 
a-particles are not healthy has been in the news lately (in 2007), since the former Russian 
spy Litvinenko is said to have been killed by a small amount of polonium, which is an 
a-emitter. 

An a-particle emitted from a radioactive nucleus typically has a speed of 15,000 kilometers 
per second. Although this might look very fast, it is only 5% of the speed of light, which 
means that for a lot of calculations a-particles can be considered nonrelativistically, that 
is, without using special relativity. For some more accurate calculations though, special 
relativity is required. 

For the nonrelativistic a-particle we have to use the Schrodinger equation. For a particle 
of mass m moving in a potential V{x, y, z) the Schrodinger equation is given by 

d h'^ 

ih—ipix, y, z, t) = ~^^^^(^' ^' ^' ^) + ^(^' ^' ^)^(^' 

where t/j is the wave function of the particle, and = V • V is the Laplace operator. 

The wave function contains the information about the particle. The quantity \iIj{x, y, z, t)p 
is the probability density for finding the particle at time t in a given position {x, y, z). For 
beam considerations, we take V{x,y,z) = 0. Since we shoot the a-particles just in one 
direction, we assume ip{x,y,z,t) = (f){t)x{x). We obtain 

where the dot denotes the derivative with respect to time t and the prime ' the derivative 
with respect to the coordinate x. The left-hand side of (2.14) only depends on time t 
and the right-hand side only on a;, which implies that both sides are a constant (with the 
dimensions of time~^) independent of t and x. We denote this constant by u and obtain 
two linear ordinary differential equations for (p and x with the solutions 



where a and b are some constants; we have normalized the constant in front of to 1 
since we are only interested in the product of (f) and x- Note that we wrote the solution 
suggestively as if a; > and in fact, on physical grounds it is; the solutions with a; < are 
not integrable and hence cannot determine a probability distribution. 

We can express E in terms of k, which plays the role of the inverse wavelength, getting 
u!{k) = 11^. Reintroducing an arbitrary direction unit vector n and the wave vector k = kn, 
we obtain the dispersion relation of the Schrodinger equation. 
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Therefore the general solution can be expanded as 

il;{x,t) ^ J dk (a{k)e-'^^^-'^ + b{ky e-^^^+'^^^ . 

As explained before, the quantity \ilj{x,y, z,t)\'^ gives the probability density for finding 
the particle at position [x, y, z) at time t. If we have an experiment with a great number 
of non-interacting particles, all of which have the same wave function ■0, the quantity 
t) p is proportional to the particle density. However, a-particles interact and thus 
the Hamiltonian is different. If we assume the particle density is not too high we can 
still assume that the a-particles move as if there were no other a-particles. Under this 
assumption we may again take \iij{x,y,z,t)\'^ as the particle density. The energy density 
is then proportional to \%l){x,y,z,t)\'^. Putting as before the whole experiment in a box 
of finite volume V one can again arrive at a Hamiltonian corresponding to a collection of 
independent harmonic oscillators. 

Now we look at relativistic a-particles, and remind the reader of the notation introduced in 
Section 1.6. The dynamics of relativistic ct-particles of mass m is described by a real- valued 
function whose evolution is governed by the Klein-Gordon equation, which is 

given by 

n - ^) ^ = (2.15) 
with the second order differential operator 

Q2 Q2 Q2 Q2 

□ ^TTTTT + TT^ + TT^ + 



c^dt"^ dx"^ dy"^ dz"^ ' 

called the d'Alembertian. Here c is the speed of light. We look for wave like solutions 
-0 e^^'^ for some vector k'*. Note that d^e^^'^ = ik^e^'^'^ and hence Ue^^'^ — ^k'^e^^'^, 
where k"^ — k • k. Hence we obtain the condition on k 

±A;2 + !!^^0. (2.16) 

w 

Writing k^ — uj and denoting the spatial parts of k with bold k we thus get the dispersion 
relation 

a; = ±y^%M^. (2.17) 

We see that h\uj\ = E = mc^, combining Einstein's famous formula E = mc^ and Planck's 
law E — hw. The solution for a given choice of sign of u is expanded in Fourier terms and 
most often written as 

^^^^32^(fc) («(k)e^'*^('^*-^^-^ + a(k)*e— (^)*+^^-) , 
where we used the Lorentz-invariant measure (1.26) involving u}{k) = ^ c^k^ + 
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2.5 Beta rays 



We now discuss beams composed of spin ^ particles, the /9-particles. /3-radiation is emitted 
by radioactive material. Unstable nuclei can lose some of their energy and go to a more sta- 
ble nucleus under the emission of /5-particles. There are two kinds of /3-particles; positively 
charged and negatively charged. The negatively charged version is consisting of nothing 
more than electrons. The positively charged counterpart consists of the antiparticles of 
the electrons, the so-called positrons. 

Other examples of fermions are neutrinos. The sun emits a stream of neutrinos; in each 
second, there are approximately 10^^ neutrinos flying through your body (it depends on 
which latitude you are, how big you are and whether you are standing or lying down, 
making a difference of a factor of 100 perhaps). Neutrinos fly very fast; the solar neutrinos 
travel at the speed of light (or very close). The reason they travel that fast is that neutrinos 
have a zero or very tiny mass, and massless particles (sue has photons) always travel at 
the speed of light. For a long time, neutrinos were beheved to be massless; only recently 
it became an established fact that at least one of the three generations of neutrinos must 
have a tiny positive mass. We do not feel anything of the many neutrinos coming from the 
sun and steadily passing through our body, because - unlike protons and electrons - they 
hardly interact with matter; for example, to absorb half of the solar neutrinos, one would 
need a sohd lead wall of around 10^^ meters thick! The reason is that they do not have 
charge: they are neutral. 

To discuss the case that the particles shot by the beam are fermions, we have to use the 
Dirac equation. It is convenient to use the same conventions for dealing with relativistic 

particles. In addition to the previously introduced symbols, we now introduce the so-called 
7-matrices. In four dimensions there are four of them, called 7^^, . . . , 7^, and they satisfy 



The associative algebra generated by the 7-matrices subject to the above relation is a 
Clifford algebra. There are several possibilities to find a representation for the 7-matrices 
in terms of 4 x 4-matrices; a frequently made choice is 



where the are the Pauli matrices (1.22). however, we only need the defining relation 
(2.18). We assemble the 7-matrices in a vector (7^,7^,7^,7^) and inner products with 
vectors are given by 7 • p = j^Pn = j'^p^rj/^i,. 

A fermion is described by a vector-like object ifj, which takes values in the spinor representa- 
tion of the Lie algebra so(3, 1). Hence we can think of ip as a vector with four-components. 
In this case, the 7-matrices are 4 x 4-matrices; that such a representation exists is shown 
by the explicit construction above. We need a property of the 7-matrices, namely that 
they are traceless (in any representation). To prove this, take any 7^ and choose another 
7-matrix 7*^, // 7^ I/. Then we have 



(2.18) 



r 



7O = ® 1 , 7I = C72 ® 1 
■2 = (7^ (7^ , 7^ = (7^ (g) C7^ , 
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Hence tr = 0. 

With these prehminaries the Dirac equation is given by 

/ „ mc\ , 

(7.9+^)^ = 0. 

Acting on the Dirac equation with (7 • 9 — ^) and using ■ p'-f ■ p = p"^ for any four- vector 
p'^, we see that each component of the spinor obeys the Klein-Gordon equation (2.15). 

We look for solutions of the form ip — ue^^'^. Putting this ansatz in the Dirac equation we 
obtain 

(i7-A;- ^)ti = 0, (2.19) 

and the additional constraint /c^ + ^^^^ = follows from the Klein-Gordon equation. Equa- 
tion (2.19) can be written as 

(1 — i — 7 • A; I ti = , 
\ mc J 

and it is easy to see that 

2 \ mc J 

satisfies = P. Hence P is a projection operator and splits as V Q)V' with V = PC^ 
and V — (1 — P)C^. The Dirac equation thus tells us that u has to be in V. Denote 
p/j. _ mc^/i^ then = —1. We now choose a frame moving along with the particle, so that 
in that frame the particle is not moving, hence we may choose p'^ = (1, 0, 0, 0) in the chosen 
frame. It follows that 2P = 1 + 27°. The eigenvalues of are ±1 since (^7°)^ = 1. But the 
7-matrices are traceless, and hence the eigenvalues add up to 0. Therefore the eigenvalues 
of ^7° are —1, —1, -|-1, +1. Thus P can be cast in the form 



P = 



/I 








o\ 





1 




















^0 








V 



We conclude that there are two independent degrees of freedom for a fermion; similar to 
the case of light one speaks of two polarizations. For a particular choice of the sign of cuk 
we can thus specialise the expansion of the fermion to 

(i;+(fc)e^"'=*-'^-" + t;-(A;)e-^'^'=*+'^-^) , 



where cok = y k ■ k + m'^c^/h^ and where the v±{k) are linear combinations of the two basis 
polarization vectors ui and U2: 



v±{k) = a±{k)ui + P±{k)u2 ■ 
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2.6 Light rays and gamma rays 

Lasers produce light of a high intensity and with almost only one frequency. That is, the 
light of a laser is almost monochromatic. We assume that the laser is perfect and thus 
emits only radiation of one particular wavelength. Also we consider 'general lasers', which 
can radiate electrons, a-particles, /3^-radiation and so on. We shortly comment on the 
nature of the different kinds of radiation and see how the modes come into play. To make 
life easy for us, we imagine the laser is placed such that the medium through which the 
beam is shot, is the vacuum. 

First, the common situation where the radiation is light. Light waves are particular solutions 
to the Mctxwell equations in vacuum, or any other homogeneous medium. The Maxwell 
equations in vacuum are given by 

dB 

V-E = 0, VxE = -— , 

V-B = 0, VxB = lf , 

at 

where E is the electric field strength, B is the magnetic field strength and t is the time, and 
c is again the speed of light. As usual in physics, boldface symbols denote 3-dimensional 
vectors, while their components are not written in bold; 

V • A = div A := -— + -— + -— 

axi 0x2 0x3 

and 



/ dAs dA2 \ 



V X A = rot A := 



dX2 


8x3 


dAi 


OAs 


8X2. 


dxi 


dA2 


dAi 



\ dxi dx2 I 

denote the divergence and curl of a vector field A, respectively. Using the generally valid 
relation 

V X (V X X) = V(V • X) - V^X 

and the fact that the divergence of B and E vanishes we obtain from the Maxwell equations 
the wave equations 

2 1 "9^ 2 1 

To solve we use the ansatz 

E(x, y, z, t) = Ee'^^-'^-'' , B{x, y, z, t) - Be^'"*"^'"-^ , 

where E and B are now fixed vectors. The ansatz represents waves propagating in the 
k-direction and at any fixed point in space the measured frequency is cu. Prom the wave 
equations we immediately find the dispersion relations for the Maxwell equations that 
relate cu and k = {k^, ky, k^)'^; 

cu — c\k\, 
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Figure 2.1: An image of a solution of the electromagnetic wave equations. The Poynting 
vector gives the direction of the wave and is perpendicular to the electric and the magnetic 
field. 

where 

I k I ! = k * k — ~t~ hy ~\~ . 

We compute 

V ■E{x,y,z,t) = k-E = 0, 

and similarly k • B = 0. Thus k is perpendicular to both E and B. We find for the outer 
products 

V X E(x, y, z, t) = -ik x E(x, y,z,t) , V x B(x, y, z, t) = -ik x B{x, y, z, t) , 
and thus it follows 

CO 

kxE = a;B, kxB = — -E . 

We see that E and B are perpendicular to each other, and k is perpendicular to E and B, 
hence k is parallel to the so-called Poynting vector P = E x B. Figure 2.1 displays an 
image of a solution. 

Without loss of generality we may change the coordinates so that k points into the z- 
direction; then kx = ky = and only kz is nonzero. Then, since cuB = k x E and E is 
orthogonal to k, light is completely determined by giving the x- and |/- components of E. 
Thus light has two degrees of freedom; put in other words, hght has two polairizations. 
Linearly polarized light is light where E oscillates in a constant direction orthogonal to the 
light ray. Circularly polarized light is light where E rotates along the path of light; this 
can be achieved by superimposing two linearly polarized light beams. Since the Maxwell 
equations are linear, any sum of solutions is again a solution. Note that to actually get the 
solution for E(x, y, z, t), one has to take the real part. 

So we have seen that a light beam is determined by giving two polarizations. These polar- 
izations can be interpreted as modes of an oscillator. One can write the general solution in 
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terms of coefficient functions a(k) as 




(2.20) 



wfiere the frequency is given by the dispersion relation a;k = c|k|, and 



Uk(x) = £i(k)e*-^ + £2(k)e^ 



,ikx 



where £i(k) and £2(k) are polarization vectors chosen to satisfy 

£l(k) • £2(k) = £i(k) • k = £2(k) • k = , £i(k)2 = £2(k)2 = 1 . 



Note the similarity with the Maxwell equation. The main difference is in the dispersion 
relation. In addition, since now the fields function are real, the coefficients of the positive 
frequency part and the negative frequency part are related. The positive frequency part 



of the solution (2.20) is called the analytic signal of E; clearly E = E + E = 2ReE. 

In the quantum theory one promotes the modes a(k) and a*(k) to operators. We treat the 
transition from the classical theory to the quantum theory in detail only for the harmonic 
oscillator, corresponding to a single monochromatic mode; see Chapter 14. 

To motivate the connection, wc rewrite the Hamiltonian into a specific form that wc will 
later recognize as the Hamiltonian of a harmonic oscillator, thereby showing that the 
Maxwell equations give rise to (an infinite set of) harmonic oscillators. 

First we consider the system in a finite volume V to avoid some questions of finiteness. 
In that case (since one has to impose appropriate boundary conditions), the integral over 
wave vectors k for the electric field becomes a sum over a discrete (but infinite) set of 
wave vectors. To get a sum over finitely many terms, one also has to remove wave vectors 
with very large momentum; this corresponds to discrctizing space. Getting a proper limit 
is the subject of renormalization theory, which is beyond the scope of our presentation. 
The mathematical details for interactive fields are still obscure; indeed, whether quantum 
electrodynamics (QED) exists as a mathematically well-defined theory is one of the big 
open questions in mathematical physics. The functions Uk(x) can then be normalized as 



The energy density of the electromagnetic field is proportional to E^ + B^. Hence classically 
the Hamiltonian is given by 



^We arc not taking all details into account here, since we only want to convey the general picture of 
what is happening and don't use the material outside this section. 
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Inserting the expansions (2.20) of E and B into the expression for the Hamiltonian and 
taking into account the normahzation of the one obtains after shifting the ground state 
energy to zero and performing the so-caUed thermodynamic hmit V ^ oo a. Hamiltonian 
of the form 



In Chapter 14 we will show that the quantum mechanical Hamiltonian of the harmonic 
oscillator is given hj H = huja*a for some constant u and operators a and a*. For light we 
thus obtain for each possible k-vector a quantum oscillator. In practice, a laser admits only 
a selection of possible k- vectors. In the ideal case that there is only one possible k-vector, 
that is, the Poynting vector can only point in one direction and only one wavelength is 
allowed, the Hamiltonian reduces to the Hamiltonian of one harmonic oscillator. 




k 



Chapter 



3 



Spectral analysis 



In this chapter we show that the spectrum of a quantum Hamiltonian (defining the admis- 
sible energy levels) contains very useful information about a conservative quantum system. 
It not only allows one to solve the Heisenberg equations of motion but also has a direct link 
to experiment, in that the differences of the energy levels are directly observable, since they 
can be probed by coupling the system to a harmonic oscillator with adjustable frequency. 



In quantum mechanics the classical Hamiltonian becomes an operator on some Hilbert 
space. Formally, instead of a formal expression H{qp) defining a classical A^-particle Hamil- 
tonian as a function of position q and momentum p, one has a similar expression where 
now q and p are vectors whose components are linear operators on the Hilbert space 
depending on the representation used. In the position representation, the components 
of q act as multiplication by position coordinates, while the components of p are multiples of 
the differentioation operators with respect to the position coordinates; in the momentum 
representation, this also holds but with position and momentum interchanged. Both 
representations are equivalent. 

The collection of eigenvalues of the Hamiltonian H oi a quantum system is referred to as 
the spectrum of H (or of the system). Formally, the spectrum of a hnear operator H is 
the set of all £" e C such that H — E is not invertible. In finite dimensions, this implies 
the nontrivial solvability of the equation {H — E)iIj = 0, and hence of the existence of an 
eigenvector ijj ^ satisfying the time-independent Schrodinger equation 



In infinite dimensions, things are a bit more complicated and require the spectral theorem 
form functional analysis. If the spectrum of H is, however, discrete then (3.1) remains 
valid. 



3.1 The quantum spectrum 



Hijj = Eijj. 



(3.1) 
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As we shall show in Section 14.3, the Hamiltonian of a quantum harmonic oscillator in 
normal mode form is given hj H = Eq + hujn. where n is the so-called number operator 
whose spectrum consists of the nonnegative integers. Hence the eigenvectors of H (also 
called eigenfunctions if, as here, the Hilbert space consists of functions) are eigenvectors 
of n, and the eigenvalues E^ of H are related to the eigenvalues A; e Nq of n by the formula 

Ek = Eq + khcu. 

This shows that the eigenvalues of the quantum harmonic oscillator are quantized, and the 
eigenvalue differences are integral multiples of the energy quantum huj. That the spectrum 
of the Hamiltonian is discrete is sometimes rephrased as 'H is quantized'. 

In this and the next section we investigate the experimental meaning of the spectrum of 
the Hamiltonian of an arbitrary quantum system. Since the Hamiltonian describes the 
evolution of the system via the quantum Heisenberg equation (1.16), i.e., 

f^Hnf='-[H,f], 

one expects that the spectrum will be related to the time dependence of f{t). To solve 
the Heisenberg equation, we need to find a representation where the Hamiltonian acts 
diagonally. 

In the case where the Hilbert space H is finite-dimensional, we can always diagonalize H, 
since H is Hcrmitian. There is an orthonormal basis of eigenvectors of H, and fixing such 
a basis we may represent all G EI by their components ipk in this basis, thus identifying 
EI with with the standard inner product. In this representation, H acts as a diagonal 
matrix whose diagonal entries are the eigenvalues corresponding to the basis of eigenvectors; 

{Hilj)k = Ekiik ■ 

In the case where the Hilbert space is infinite-dimensional and H is self-adjoint, an analogous 
representation is possible, using the Gel'fand— Maurin theorem, also known under the 
name nucleeir spectral theorem. The theorem asserts that if H is self-adjoint, then 
H can be extended into the dual space of the domain of definition of H; there it has a 
complete family of eigenfunctions, which can be used to coordinatize the Hilbert space. 
The situation is slightly complicated by the fact that the spectrum may be partially or 
fully continuous, in which case the concept of a basis of eigenvectors no longer makes sense 
since the eigenvectors corresponding to points in the continuous spectrum are no longer 
square integrable and hence lie outside the Hilbert space. 

In the physics literature, the rigorous mathematical exposition is usually abandoned at this 
stage, and one simply proceeds by analogy, choosing a set of labels of the eigenstates 
and treating them "formally" as if they form a discrete set. Often, the discreteness of 
the spectrum is enforced verbally by artificially "putting the particles in a finite box" and 
going to an infinite volume limit at the very end of the computations. The justification 
for the approach is that most experiments are indeed very well localized; in letting two 
protons collide in CERN we do not take interaction with particles on Jupiter into account. 
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Mathematically we thus put our system in a box. Since we do not want our system to 
interact too much with the walls of the box we take the box large enough. Having met the 
final requirement one observes that the physical quantities do not depend on the precise 
form and size of the box. To simplify the equations one then takes the size of the box 
to infinity. Making this mathematically precise is quite difficult, though well-understood 
for nonrelativiastic systems. In particular, for the part of the spectrum that becomes 
continuous in this limit, the limits of the eigenvectors become generalized eigenvectors 
lying no longer in the Hilbet space itself but in a distributional extension of the Hilbert 
space which must be discussed in the setting of a so-called Gelfand triple or rigged 
Hilbert space; cf. Sections.braket. 

In many cases of physical interest, these generalized eigenvectors come in two fiavors, de- 
pending on the boundary conditions imposed, resulting in two famihes of in-eigenstates 
and out-eigenstates labelled by a set Q whic in the case of the harmonic oscilla- 
tor is f2 = No- (The bra-ket notation used here informally is made precise in Section 14.4.) 
The in- and out-states arc called so because they have a natural geometric interpretation in 
scattering experiments (see Section 3.4). In addition to these eigenstates, there is a measure 
djj,{k) on n, and a spectral density p{k) with real positive values such that every vector 
in the Hilbert space has a unique representation in the form 

[ dii{k)iP+{k)\k)+= I dii{k)iP4k)\k)^. 

For any fixed choice of the sign in ip{k) :— ip±{k), the inner product is given by 

/ dii{k)p{k)4>{k)i){k) . (3.2) 
Jo. 

The spectral measure dpik) may also have a discrete part corresponding to square integrable 
eigenstates, in which case | A;)+ = |^)-- If all eigenvectors are square integrable, the spectrum 
is completely discrete. In particular, this is the case for the harmonic oscillator, for which 
we construct the diagonal representation explicitly in Section 14. 

Since the are eigenvectors with corresponding eigenvalue E{k) — E^, that is, the 
Hamiltonian satisfies 

iH^){k) = E{k)i,{k) , (3.3) 

we say that H acts diagonally in the representation defined by the ip{k). Thus one 
can identify the Hilbert space with the space L^(f2) of coefficient functions ijj± with finite 
J^diJ,{k)p{k)\ip±{k)\'^; the Hamiltonian is then determined by (3.3). The in- and out-states 
are related by the so-called S-matrix, a unitary matrix S e LinL^(Q) such that 

As a consequence of the time-symmetric nature of conservative quantum dynamics and the 
time-asymmetry of scattering eigenstates, the in-representation and the out-representation 
are both equivalent to the original representation on which the Hamiltonian is defined. 
In many cases of interest, one can then rigorously prove existence and uniqueness of the 
S-matrix. 
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The transformation from an arbitrary Hilbert space representation to the equivalent repre- 
sentation in terms of which H is diagonal, is an analogue of a Fourier transformation; the 
latter corresponds to the special case where EI = L^(M) and H is a. differential operator 
with constant coefficients. 

In general, the Gel'fand-Maurin theorem guarantees the existence of a topological space 
Q and a Borel measurable spectral density function p : — > M_|_ such that the original 
Hilbert space is L^(fi,p) with inner product (3.2) and such that (3.3) holds. Indeed, O, 
can be constructed as the set of chciracters, that is, *-homomorphisms into the complex 
numbers, of a maximal commutative C*-algebra of bounded linear operators containing the 
bounded operators e**^ (t G M). (Since we don't use this construction further, the concepts 
involved will not be explained in detail.) 

The above reasoning is completely parallel to the finite- dimensional case, where H = C". 
There one would write ip = X^fc'^fcl^) ^^'^ have {Hip)k = Ekipk- An arbitrary quantity 
/ e Lin EI = C'^^"' would then be represented by a matrix, acting as {fip)k = ^ifki'4'i- 
In the infinite-dimensional setting, k takes values in the label space Q. The quantities of 
primary interest are represented by integral operators defined by a kernel function 

(M)(k) / di,(l)f(k,im); 
Jn 

the f{k,l) are the analogues of the matrix entries fki- 



Finding a diagonal representation for a given quantum system (i.e., given a Hilbert space 
and a Hamiltonian) is in general quite difficult; succeeding is virtually equivalent with 
"solving" the theory of the system. Indeed, in terms of the diagonal representation, we can 
obtain a full solution of the Heisenberg dynamics. We have 

jji,(i)f(k,i,tm) = '-({Hfi^){k)-{fHi^){k)) 

= {E{k) ^ d^x{l) if^) (0 - ^ di,f{k, I, t)E{i)m) 

from which it follows that 

fik,l,t)^'-(^E{k)-E{l))f{k,l,t). (3.4) 

In (3.4) we recognize a linear differential equation with constant coefficients, whose general 
solution is 

/(A;,M) = ei(^«-^«)V(^,i,0). 
Thus the kernel function of the operator / has oscillatory behavior with frequencies 

m - E{i) (o.^ 

oJki = ^ ■ (3.5) 
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This relation, the modern form of the Rydberg— Ritz combination principle found in 
1908 by Walter Ritz [211], may be expressed in the form 

AE = hoj, (3.6) 

The formula (3.6) appears first in Planck's famous paper [197] from 1900 where he explained 
the radiation spectrum of a black body. Planck wrote it in the form AE = hi>, where 
h — 2Trh and i/ — a;/27r is the linear frequency. The symbol for the quotient h — /i/27r, 
which translates this into our formula was invented much later, in 1930, by Dirac in his 

famous book on quantum mechanics [64]. The book contains the Dirac equation but also 
Dirac's famous mistake (cf. the next section) - he had wrongly interpreted the antiparticle 
of the electron predicted by his equation (later named the positron) to be the proton. 



3.2 Probing the spectrum of a system 

All physical systems exhibit small (and sometimes large) oscillations of various frequencies, 
collectively referred to as the spectrum of the system. By observing the size of these 
oscillations and their dependence on the frequency, valuable information can be obtained 
about intrinsic properties of the system. Indeed, the resulting science of spectroscopy 
is today one of the indispensable means for obtaining experimental information on the 
structure of chemical materials and the presence of traces of chemical compounds. 

To probe the spectrum of a quantum system, we bring it into contact with a macroscopically 
observable (hence classical) weakly damped harmonic oscillator. That we treat just a single 
harmonic oscillator is for convenience only. In practice, one often observes many oscillators 
simultaneously, e.g., by observing the oscillations of the electromagnetic field in the form 
of electromagnetic radiation - light. X-rays, or microwaves. However, the oscillators do 
not interact that strongly in most cases and in the case of electromagnetic radiation not at 
all. In that case the result of probing a system with multiple oscillators results in a linear 
superposition of the results of probing with a single oscillator. This is a special case of the 
general fact that solutions of linear differential equations depend linearly on the right hand 
side. 

Prom the point of view of the macroscopically observable classical oscillator, the probed 
quantum system appears simply as a time-dependent external force F{t) that modifies the 
dynamics of the free harmonic oscillator. Instead of the equation mq + cq + kq — we get 
the differential equation describing the forced harmonic oscillator, given by 

mq + cq + kq — F{t) . 

The external force F is usually the value 

m = (fit)) 

of a quantity / from the algebra of quantities of the probed system, as discussed in more 
detail in Part II. This follows from the general principles of Section 13.2 for modeling 
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interactions of a quantum system with a classical, macroscopic system (only the latter are 
directly measurable). How classical measurements are to be interpreted in a pure quantum 
context will be discussed in Section 7.4. 

If the measurement is done far from the probed system, such as a measurement of light 
(electromagnetic radiation) emitted by a far away source (e.g., a star, but also a Bunsen 
flame observed by the eye), the back reaction of the classical oscillator on the probed system 
can be neglected. Then the probed system can be considered as a Hamiltonian system and 
evolves according to the Heisenberg equation (1.12). In particular, the analysis of Section 
3.1 applies, and since expectations are linear, the external force F evolves as a superposition 
of exponentials e*'^*, where the uj are differences of eigenvalues of H. In the quantum case 
the spectrum may have a discrete part, leading to a sum of different exponentials e*'^* that, 
as we shall see, leads to conspicuous spikes in the Fourier transform of the response and a 
continuous part that leads to an integral over such terms which typically provide a smooth 
background response. In the following, we shall assume for simplicity a purely discrete 
spectrum, and hence an expansion of F of the form 

F(t)=J2Fie'^'', 
I 

with distinct, real and nonzero frequencies. However, the analysis holds with obvious 
changes also for a (partly or fully) continuous spectrum if the sums are replaced by appro- 
priate integrals. 

The solution to the differential equation consists of a particular solution and a solution to 

the homogeneous equation. Due to damping, the latter is transient and decays to zero. To 
get a particular solution, we note that common experience shows that forced oscillations 
typically have the same frequency as the force. We therefore make the ansatz 



q{t) = Yl ^l^" 



iuiit 
I 

Inserting both sums into the differential equation, we obtain the relation 



from which we conclude that we have a solution precisely when 

Fi 

qi = 2~r~ — ' ^"^^ ^■ 

k — moof + tciJi 

Note that due to our assumptions, the denominator cannot vanish. The energy in the Ith 
mode is therefore proportional to 

^ {k- mwfy + {auiy ' ^^'^^ 



Now flrst imagine that the system under study has only one frequency, that is, F/ 7^ 
for only one I. For example, the system under study is also an oscillator that is swinging 
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Figure 3.1: Lorentz-shape. The absorbed energy of the oscillator with varying frequency u. 

with a certain frequency. In this case the oscillator with which we probe the system will 
also swing with that same frequency as the probed system, but with an amplitude given 

by (3.7). We see that for uj = J — close to loi the oscillator responds most to the force it 



feels from the probed system. The frequency cu = y ^ is called the resonance frequency 
of the oscillator. The above we know from phenomena of daily (or not so daily) life, as 
pushing a swing (or riding a car with a defect shock absorber); if you push with the 'right' 
frequency the result will be that the swing goes higher and higher, pushing with another 
frequency results in a seemingly chaotic incoherent swinging. 

Returning to the case that there are more Fi nonzero, we see that the oscillator will swing 
with the same frequencies as the probed system. But the intensity with which the oscillator 
swings depends on the positions of the uji relative to the resonance frequencies. Suppose 
that c is relatively small, so that we can ignore the term cuj in the denominator of (3.7). 
Then the qi for which uji is close to uj show a higher intensity. 

Looking for resonances with an oscillator that has an adjustable frequency uj therefore gives 
a way to experimentally find the frequencies in the force incident to the oscillator. If the 
frequency lo passes over one of the frequencies of the probed system, the oscillator will 
swing more intensively. 

The resonances occur around a natural frequency but also the width of the interval in 

which the system shows a resonance has information. If the interval is small, one speaks 
of a sharp resonance and this corresponds to a discrete or nearly discrete spectrum of 
the frequencies. If the resonance is not sharp, the response corresponds to a continuous 
spectrum. The graph that shows the absorbed energy (which is proportional to \qi\'^) as a 
function of the frequency (~ uji) for a system with one resonance frequency typically has a 
Lorentz shape, according to the formula (3.7): There is a peak around ujq — y/k/m with 
a certain width, and on both sides of the peak the function tends to zero at plus and minus 
infinity. In Figure 3.1 we displayed a graph of a Lorentz shape for a harmonic oscillator 
with varying frequency uj in contact with a probed system that has one Fi nonzero for the 





58 



CHAPTER 3. SPECTRAL ANALYSIS 



frequency Uq. 

For general systems with more resonance frequencies, the graph is a superposition of such 
curves and the peaks around the resonance frequencies can have different widths and differ- 
ent heights. This graph is recorded by typical spectrometers, and the shapes and positions 
of characteristic pieces of the graph contain important information about the system. We 
shall assume that the peaks have already been translated into resonance frequencies (a non- 
trivial task in case of overlapping resonances), and concentrate on relating these frequencies 
to the Hamiltonian of the system. This is done in the next section. 



3.3 The early history of quantum mechanics 

In this section we remark on some important aspects of the history of quantum mechanics. 
We focus on the physics of the atom, which was one of the main reasons to develop quantum 
mechanics. In Section 3.5 we discuss the physics of the black body and the history of the 
formula of Planck, which describes black body radiation. For an interesting historical 
account we refer to for example VAN der Waerden[249] or Zeidler [268]. 

The importance of the spectrum in quantum physics is not only due to the preceding 
analysis, which allows a complete solution of the dynamics, but also to the fact that the 
spectrum can easily be probed experimentally. Indeed, spectral data (from black body 
radiation and the spectral absorption and emission lines of hydrogen) were historically the 
trigger for the development of modern quantum theory. Even the name spectrum for the 
set of eigenvalues was derived from this connection to experiment. 

Probing the spectrum through contact with a damped harmonic oscillator has been dis- 
cussed in Section 3.2. Note that the observed frequencies give the spectrum of the force, 
not the spectrum of the Hamiltonian. As derived above, the spectrum of the force con- 
sists of the spectral differences of the Hamiltonian spectrum. This is in accordance with 
the fact that (in nonrelativistic mechanics) absolute energy is meaningless and only energy 
differences are observable. 

In case of the harmonic oscillator, the spectrum of the Hamiltonian H is discrete (see 
Chapter 14 for the details and derivation), consisting of the nonnegative integral multiples 
kuj of the base frequency u. Thus the set of labels for the eigenvectors \k) is discrete, 
= Nq. The number of allowed frequencies is thus countable and the external force may 
be expanded into a sum of the form 

Explicitly, the frequencies are given by cuki — oj{k — I) & 'Sm). Thus quantum mechanics 
produces overtones. This is not an authentic quantum mechanical feature; in classical 
mechanics one finds overtones in a similar setting - for example, in the pitching of a guitar 
string. 
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A historically more interesting system is the hydrogen atom, where the energies are given 
by an equation of the form 

for some constant C. Then the frequencies are given by the Rydberg formula 



where Rh ~ 1.1 • 10^m~^ is the Rydberg constant. The Rydberg formula correctly gives 
the observed spectral lines of the hydrogen atom. The formula was discovered by Rydberg 
in 1889 (Martinson and Curtis [167]) after preliminary work of Balmer, who found the 
formula for the Balmer series of spectral lines (given by A; = 2). Schrodinger derived this 
formula using the theoretical framework of quantum mechanics. 

Let us review the situation of the time where quantum mechanics was conceived. Around 
1900 physicists were experimentally exploring the atom, which until then was (since antiq- 
uity) only a philosophically disputable part of Nature. The experiments clearly indicated 
that atoms existed and that matter was built up from atoms. The physicist Boltzmann had 
argued that atoms existed, but his point of view had not been accepted; only after his death 
in 1906, the existence of atoms was unarguably proved by experiments by Perrin around 
1909. This lead to pthe problem of finding the constituents of the atom and its structure. In 
1897 Thompson had discovered the electron as a subatomic particle. Since the atom is elec- 
trically neutral, the atom has to contain positively charged particles. Thompson thought of 
a model in which the atom was a positively charged sphere with the electrons being in this 
"plum pudding" of positive charge. But then in 1911 Rutherford put Thompson's model 
to the test; Marsden and Geiger, who were working under the supervision of Rutherford, 
shot a-particles at a thin foil of gold and looked at the scattering pattern [103]. The ex- 
periment is therefore called the Geiger— Marsden experiment. At that time, a-particles 
were considered a special radiation emitted by some 'radio-active' elements; now we know 
that these are the nuclei of Helium with the electrons being stripped off. 

Since the a-particles are positive, they have a particular kind of interaction with the pos- 
itively charged sphere of Thompson's model. But since the electrons swim around in the 
positive charge, the net charge is zero and most interaction is screened off. Therefore it 
was expected that the a-particles would be only slightly deflected. However, the pattern 
was not at all like that! It rather looked as if almost all a-particles went straight through 
and a small percentage was deflected by a concentrated positive charge. Most a-particles 
that were deflected were scattered backwards, implying that they had an almost head-on 
collision with a positive charge. 

The very small percentage of scattered a-particles indicated that the chance that an a- 
particle meets a positively charged nucleus on its way is very small, which implies that the 
nucleus is very small compared to the atom. Therefore Rutherford (who wrote a paper to 
explain the results of the Geiger-Marsden experiment) concluded that the nucleus of an 
atom is positively charged and the electrons circle around the nucleus, and furthermore, the 
size of the nucleus is very small compared to the radii at which the electrons circle around 
the atom [220] . If one imagines the atomic nucleus to have the size of a pea and one would 




(3.8) 
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place it at the top of the Eiffel tower, the closest electrons would circle around in an orbit 
that touches the ground; the atom is merely empty. 

In 1918 it was again Rutherford who performed an important experiment from which he 
concluded that the electric charge of the atomic nucleus was carried by little particles, called 
protons. The hydrogen atom was found the be the simplest atom; it consists of a proton 
and one electron circling around the proton. Because of this experiment the discovery of 
the proton is attributed to Rutherford. 

Classically, if an electron circles around in an electric field it radiates and thus loses energy. 
The question thus arises why the hydrogen atom is stable. Again classically, an electron can 
circle around a positive charge with arbitrary energy. If the electron changes its orbit, this 
happens gradually, hence the energy changes continuously and the absorption or emissions 
patterns of the hydrogen atom should be continuous. But experiments done by Rydberg in 
1888 and Balmer in 1885 showed that hydrogen absorbed or emitted light at well-defined 
frequencies, visible as lines in the spectrum obtained by refraction. For the atomic model 
this implies that the electron can only have well-defined energies separated by gaps (forbid- 
den energies). In 1913 Bohr wrote a series of papers [36, 37, 38, 39] in which he postulated 
a model to account for this. Bohr postulated that angular momentum is quantized (if p is 
the momentum of the electron and r the radius, then the angular momentum is L — r x p, 
where the cross denotes the vector product) and that the electron does not lose energy 
continuously. With these assumptions he could explain the spectrum observed by Rydberg. 

The model of Bohr did not explain the behavior of atoms, it only gave rules the atom 
had to obey. In 1925 Werner Heisenberg wrote a paper [110] where he tried to give a 
fundamental basis for the rules of quantum mechanics. Heisenberg described the dynamics 
of the transitions of an electron in an atom by using the 'states' of the electron as labels. 
For example, he wrote the frequency emitted by an electron jumping from a state n to a 
state n — a as u{n,n — a). Just two months later Max Born and Pascal Jordan wrote a 
paper [42] about the paper of Heisenberg, in which they made clear that what Heisenberg 
actually did was promoting observables to matrices. The three of them. Born, Jordan and 
Heisenberg, wrote in the same year a paper [160] where they elaborated on the formalism 
they developed. Also in the same year 1925 Paul Dirac wrote a paper in response to the 
paper of Heisenberg, in which the remarkable relation g^Ps ~ PsQr — Srgih appeared. Dirac 
tried to find the relation between a classical theory and the corresponding quantum theory. 
In fact, Dirac postulated this equation: "we make the fundamental assumption that the 
difference between the Heisenberg product of two quantum quantities is equal to ih/2n 
times their Poisson bracket expression" . 

So, in the beginning years of quantum mechanics, the dynamics of the observables was 
described by a kind of matrix mechanics. (A modern version of this is the view presented 
in the present book.) Based on work of de Broglie, Schrodinger came up with a differ- 
ential equation for the nonrelativistic electron [224]. The probability interpretation for 
Schrodinger's wave function was found by Born. In 1927, Pauli reformulated his exclusion 
principle in terms of spin and antisymmetry. In 1928, Dirac discovered the Dirac equation 
for the relativistic electron. In 1932, the early years concluded with the discovery of the 
positron by Anderson and the neutron by Chadwick, which were enough to explain the be- 
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havior of ordinary matter and radioactivity. But the forces that hold the nucleus together 
were still unknown, and already in 1934, Yukawa predicted the existence of new particles, 
the mesons. Since then the particle zoo has increased further and further. 

A number of Nobel prizes (most of them in physics, but one in chemistry - early research 
on atoms was interdisciplinary) for the pioneers accompanied the early development of 
quantum mechanics^: 



• 1908 Ernest Rutherford, (Nobel prize in chemistry) for his investigations into the 
disintegration of the elements, and the chemistry of radioactive substances 

• 1918 Max Planck, in recognition of the services he rendered to the advancement of 
physics by his discovery of energy quanta 

• 1921 Albert Einstein, for his services to theoretical physics, and especially for his 
discovery of the law of the photoelectric effect 

• 1922 Niels Bohr, for his services in the investigation of the structure of atoms and of 
the radiation emanating from them 

• 1929 Louis de Broghe, for his discovery of the wave nature of electrons 

• 1932 Werner Heisenberg for the creation of quantum mechanics, the application of 
which has led among others to the discovery of the allotropic forms of hydrogen 

• 1933 Erwin Schrodinger and Paul A.M. Dirac, for the discovery of new productive 
forms of atomic theory 

• 1935 James Chadwick, for the discovery of the neutron 

• 1936 Carl D. Anderson, for his discovery of the positron 



and belatedly, but still for work done before 1935, 



• 1945 Wolfgang Pauh, for the discovery of the exclusion principle, also called the Pauh 
principle 

• 1949 Hideki Yukawa, for his prediction of the existence of mesons on the basis of 
theoretical work on nuclear forces 

• 1954 Max Born, for his fundamental research in quantum mechanics, especially for 
his statistical interpretation of the wave function 

^Thc remarks to each Nobel laureate are the official wordings in the annomicements of the Nobel 
prizes. For press announcements, Nobel lectures of the laureates, and their biographies, see the web site 
http : //nobelprize .org/ physics /laureates. 
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The story of the discovery of antimatter is interesting. Though Dirac called it a prediction 
in his Nobel lecture, "There is one other feature of these equations which I should now like to 
discuss, a feature which led to the prediction of the positron", it was only a postdiction. Yes, 
he had a theory in which there were antiparticles. But before the positron was discovered, 
Dirac thought the antiparticles had to be protons (though there was a problem with the 
mass) since new particles were inconceivable at that time. Official history seems to have 
followed Dirac's lead in his Nobel lecture, and tells the story as it should have happened 
from the point of the theorist, namely that he (i.e., theory) actually predicted the positron. 
The truth is a little different. 

Anderson discovered and named the positron in 1932. He wrote the announcement of his 
discovery in Science [8], "with due reserve in interpretation". The proper publication [10], 
where he also predicted "negative protons" (now called antiprotons) , was still without any 
awareness of Dirac's theory. It is in the subsequent paper [9] that Anderson relates the 
positron to Dirac's theory. 

Heisenberg, Dirac, and Anderson were all 31 years old when they got the Nobel prize. 
The fact that Anderson's paper [10] is very rarely cited ^ should cast some doubt on the 
relevance of citation counts for actual impact in science. 



3.4 The spectrum of many-particle systems 

To give a better intuition for what kind of spectra quantum systems can be expected to 
have, we discuss here the spectrum of many-particle systems from an informal point of 
view. 

There are bound states, where all particles of the system stay together, and there are 
scattering states, where the system is broken up into several fragments moving indepen- 
dently but possibly influencing each other. The nomenclature comes from the scattering 
experiments in physics; shooting particles at each other can result in the formation of a 
system where the particles are bound together or where the particles scatter off from each 
other. In the case of a scattering process, different cirrangements, (i.e., partitions of the 
set of individual particles into fragments which form a subsystem moving together) describe 
the combination of particles before a collision and their recombination in the debris after a 
collision. 

The discrete spectrum of a Hamiltonian H corresponds to the bound states; each discrete 
eigenvalue to a different mode of the bound system. The study of the discrete spectrum 
of compound systems is the domain of spectroscopy. We shall return to this topic in 
Chapter 17, when the machinery to understand a spectrum is fully developed. 

The continuous part of the spectrum corresponds to the scattering states. In general, the 
spectrum is discrete till a certain energy level, called the dissociation threshold, and 



^http//www.prola.aps .org/ lists only 37 citations, and only 5 before 1954. The paper [9] is cited 
35 times. 
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after the dissociation threshold the spectrum is continuous. For the hydrogen atom, the 
dissociation threshold is 13.6eV. For the harmonic oscillator, the dissociation threshold is 
infinite. In such a case, where the dissociation threshold is infinite, there is no continuous 
spectrum and the system is always bound; we call this confinement. For example, three 
quarks always form a bound state, that is, they are confined. A single quark can not get 
loose from its partners. It may also be the case that there is no bound state; for example, 
the atoms in inert gases don't form bound states, hence a system consisting of more than 
one of such atoms has only a continuous spectrum. 

In scattering experiments the ingoing particles and the outgoing particles can be different. 
Hence one needs to keep track of what precisely went where. After the scattering the 
particles separate from each other in different clusters. The constituents in cluster i form 
a bound state, which can be in an excited state, which we denote Ei. If the cluster i is 
moving with a momentum, the total kinetic energy of cluster i is p?/2mi, where is the 
mass of cluster i. If there are N clusters after a coUision (scattering), the resulting total 
energy is 



In scattering experiments a possible outcome of clusters and their constituents is called a 
channel. It is very common in particle physics that a single reaction - like shooting two 
protons at each other - has more than one channel. We see that in each channel, there 
is a continuous spectrum above a certain energy level A, which is the sum of the ground 
state energies of the different clusters. To theoretically disentangle the spectrum, one uses 
an analytic continuation of the scattering amplitudes. We thus view the spectrum as a 
subset of the complex plane. When multiplying the momenta with a complex phase that 
has a nonzero imaginary part, the continuous part of the spectrum becomes imaginary and 
is tilted away from the real axis. The bound states still appear on the real fine as isolated 
points, that is, discrete. But now at each bound state with energy above A there is a line 
connected representing the continuously varying momentum of the corresponding cluster. 
The technique of disentangling the spectrum using analytic continuation is called complex 
scaling. For more background and rigorous mathematical arguments, see, e.g., Simon 
[227], MoiSEYEV [171], or BOHM [35]. 

Dissipation. If we admit dissipation, the Hamiltonian is no longer Hermitian, since there 
is typically an antihermitian contribution to the potential, generally called an optical 
potential since it was first used in optics. Also, the dynamics need no longer be governed 
by the Heisenberg equation, but can be both in the classical and in the quantum case of 
the more general form 



with Lindblad operators Lj encoding interactions with the unmodelled environment into 
which the lost energy dissipates, and complex coefficients Gjk forming a symmetric, positive 
definite matrix. Remembering that ~i acts as a derivation, the additional terms can be 
viewed as generalized diffusion terms; indeed, the dynamics (3.9) describes classically for 
example reaction-diffusion equations, and its quantum version is the quantum equivalent 
of stochastic differential equations, which model systems like Brownian motion and give 





(3.9) 
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microscopic models of diffusion processes. For details, see, e.g., Gardiner [88], Breuer 
& Petruccione [45]. 

Assuming that the terms in the sum of (3.9) are neghgible, the dynamics satisfies the Heisen- 
berg equation, and the above analysis applies with small changes. However, since H is no 
longer Hermitian, the energy levels typically acquire a possibly nonzero (and then positive) 
imaginary part. Isolated eigenvalues with positve imaginary parts are called resonances. 
The oscillation frequencies are still of of the form hu = AE, but since the energies have a 
positve imaginary part, the oscillations will be damped, as can be seen by looking at the 
form of e*'^*. That this does not lead to a decay of the response of the oscillator is due to 
stochastic contributions modelled by the Lindblad terms and neglected in our simplified 
analysis. 

Resonances with tiny imaginary parts behave almost like bound states, and represent un- 
stable particles, which decay in a stochastic manner; the value F = 2Imci; gives their 
lifetime, defined as the time where (in a large sample of unstable particles) the number of 
undecayed particles left is reduced by a factor of e, the basis of the exponential function. 

Thus the spectrum of a Hamiltonian contains valuable experimentally observable informa- 
tion about a quantum system. 



3.5 Black body radiation 

In the remainder of this chapter, we discuss the spectrum of a black body and some of its 
consequences. 

In the history the 'black body' plays an important role. Applying some basic concepts of 
quantum mechanics and statistical mechanics one arrives at the distribution formula first 
derived by Max Planck in December 1900 [196]. According to Van der Waerden in his 
(partially autobiographical) book [249] the presentation of Planck in December 1900 was 
the birth of quantum mechanics. 

What is a black body? A body that looks black does not refiect any light, it absorbs 
all incoming light. Hence if some radiation comes from a perfectly black body, it needs 
to be due to the interaction of the internal degrees of freedom with light. It is hard to 
experimentally construct a black body. The theoretical idea is to have a hollow box with 
a single little hole, through which the box can emit radiation outwards. Since the hole 
is assumed to be very small, no light will fall inwards and then be refiected through the 
hole again. Thus no light will be refiected (or at least almost no light). In practice many 
objects behave hke black bodies above a certain temperature. The sun does not reflect 
a substantial amount of light (where should it come from?) compared to the amount it 
radiates. Therefore one of the best black bodies is the sun. 

Given a black body, there is a positive integrable function f{u) of the frequency uj, such 
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that the amount of energy radiated in the frequency interval [cui, 002] is 

E{[u;i,u;2]) = / du f{u;) . 

The function f{uj) is the radiation-energy density. The main object of this section is the 
function /(cu). The importance of the black body lies in the fact that the radiation emitted 
is only due to its internal energy and its interaction with light. In practice a system has 
always interaction with the environment and light falling onto it (since we want to 'see' 
where the black body is, the latter is often inevitable). What would we expect from the 
radiation-energy density? First, since uj = means that the energy of the photons emitted 
is huj = and a; < is not a possibility, we have /(O) = 0. Second, the function / has to 
be integrable, hence hm^^-^oo /(<^) = 0. The total integral duf{u) represents the total 
energy of the body. Therefore we certainly want / to be integrable, i.e., / e L-'^(R+). 

We know from experience that black bodies (like dark metals) do not radiate any thermal 
energy when they are at room temperature, but heating them up makes them glow red. 
When we rise the temperature, the color shifts more and more in the blue direction. This 
phenomenon can also be seen in flames; the outer, cooler side is red while more inwards, 
the flame gets lighter, reaches white and goes over to blue, and then becomes invisible. 
Empirically one concludes that the function /(a;) has a maximum at a frequency Umax, 
where cu^ax is temperature dependent; the larger the temperature T, the larger cu^ax- Before 
1900 it was already found that the fraction uJmax/T was almost independent of the body 
that was heated up. In 1893 the physicist Wilhelm Wien^ used the statistical mechanics 
developed by Maxwell and Boltzmann to the laws of thermodynamics to derive Wien's 
displacement law [261] 

Umax 27rC 

where c is the speed of light and 6 is a constant whose numerical value is approximately 
2.9 ■ lO^'^m ■ K. In 1896 Wien derived a formula, called Wien's approximation for the 
radiation density 

/(cj) = acuV'^ , (3.10) 

for some parameters a,b > 0. It is clear that the proposed / is integrable and satisfies 
/(O) = 0. For large u; the radiation-energy density matches the observed densities, however, 
for small cu the radiation density of Wien does not match the experiments. 

On the other hand, there were other radiation laws. First, there was Stefan's law (or 
Stefan-Boltzmann law) derived on basis of empirical results in 1879 [233]. The statement 
of Stefan's law is that the total energy radiated per second of a hot radiating body is 
proportional to the fourth power of the temperature: 

where A is the area of the body and u is a constant. In 1884 Boltzmann gave a theoretical 
derivation of Stefan's law using the theoretical tools of statistical mechanics [40]. The 



^His real name is rather long: Wilhelm Carl Werner Otto Fritz Pranz Wien. 
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second radiation law known in 1900 was Rayleigh's law. Lord Rayleigh used classical 
mechanics to derive a better description of the radiation density for low values of lu [204]. 
He proposed 

f{u;) = 7o;^ 

which is clearly wrong for large u and is not even integrable. Later in 1905, Lord Rayleigh 
improved the derivation of his proposal in a collaboration with Sir James Jeans, again based 
on purely classical arguments. Although their discovery was interesting, it did not match 
the experiments for high uj. In December 1900, Max Planck had given a seminar and gave a 
derivation of f{uj) that resulted in a radiation-energy density that matched the experiments 
both for low and for high u. Even more was true, the formula of Planck reproduced Wien's 
displacement law, Wien's approximation, Rayleigh's proposal and Stefan's law. The formula 
Planck derived was giving the energy density of a black body in thermal equilibrium, from 
which one obtains the radiation-energy density 

f( ^ 



where V is the volume of the black body (the cavity actually) and (3 = 1/kT, k is Boltz- 
mann's constant. Indeed for low cu we get an expression that is quadratic in cu, for high 
CO we get Wien's law and integrating the expression over uj one sees that the integral is 
proportional to T^. The accordance with Wien's displacement law will be shown later - we 
will also remark on the agreement of Planck's law later. 

So what precisely did Planck do that the others did wrong? The key ingredient in Planck's 
derivation is to consider the constituents of the black body as follows: the black body is 
just a cavity where the inner walls can have an interaction with light. The walls of the 
cavity are made of molecules that behave like compounds of harmonic oscillators. Planck 
assumed that the energies of the molecules take values in some discrete set: the states of 
the molecules do not vary continuously but are discrete. Hence we can put the states in 
bijection with the natural numbers. Furthermore he assumed that the light inside the cavity 
induces transitions in the molecules by absorbing or emitting radiation. A transition from 
a state labelled with n and with energy En and a state labelled with m and with energy E^ 
is only possible if the energy differences and the frequencies are related by \En — Em\ — hcu. 
Thus by discretizing the states of the interior of the black body the interaction with light 
varies over a discrete set of frequencies. Planck at the moment saw the discretization as a 
purely theoretical and mathematical tool that would bear no relation with reality. It just 
reproduced the correct results, which was most important: it gave a formula that fitted all 
experiments. Very puzzling at the time was the necessary assumption that the energy was 
quantized - an assumption that marked the start of the quantum era. It took some time 
unitl the derivation of Planck's law was given a clear meaning. 



3.6 Derivation of Planck's law 



In 1905 Einstein gave a comprehensible derivation, which we shall present below. In modern 
textbooks one can find a one-page-derivation and we will present such a proof below as well. 
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For both derivations we need a basic fact from statistical mechanics, called the Boltzmann 
distribution. 

Suppose that we have a physical system consisting of many identical molecules (or atoms, or 
any other smaller subsystems). Each molecule can attain different states that are labelled 
with integers n = 0, 1,2, . . .. In a modern treatment, these states are identified with the 
eigenstates of the quantum Hamiltonian, and we shall use this terminology, though it was 
not available when Einstein wrote his paper. We thus assume that the spectrum of the 
molecules is discrete and there is a bijection between the eigenstates of the molecule and 
the natural numbers. Each eigenstate n of the molecule corresponds to an eigenvalue 
of the Hamiltonian, giving the energy the molecule has in eigenstate n. The Boltzmann 
distribution gives the relative frequency of eigenstates of the molecules. Writing N{n) for 
the number of molecules in state n, the Boltzmann distribution dictates that 

N(n) (En-Em) , , s 

_^ = e 1^ . (3.11 

when the system is in thermal equilibrium with itself and with the surrounding system. 
Thus, the temperature T has to be constant. Such a 'mixed' state, where the volume, the 
temperature, and the number of particles are kept constant and in thermal equilibrium 
with its environment is called a canonical ensemble. A derivation of the Boltzmann 
distribution can be found in many elementary textbooks on statistical physics, e.g., Reichl 
[205], Mandl [162], Huang [116], or Kittel [136]. 

The probability p„ of measuring an arbitrary molecule to be in state n is 

_ En 

Pn 



Z ' 

where Z is the partition function 

n 

One can thus rewrite 

(En-F) 
Pn^e kT ^ 

where we defined the Helmholtz free energy F as 

F= -kTlnZ. 

One often regroups the states into states that have equal energy. Then to each natural 
number corresponds an energy E^, and a natural number counting the number of states 
with energy E^. The number is called the degeneracy of the energy E^. Thus we have 



En 

e kT 



and the probability p„ of measuring a molecule with energy En is 



(.En-F) 

Pn = 9ne ''^ ■ 
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The constant (3 = 1/kT is called the inverse temperature and plays a fundamental role; 
in statistical physics, it is customary to express all quantities in terms of The average 
of the energy, denoted is found by 

E = Y,Er.p^^-^\nZ{(3). 

En 



Einstein's derivation. We now focus on two energies in the molecule, n and m with 
Ejn > En and degeneracies gn and gm, and assume the molecules have interaction with 
light. There are three types of processes that might happen: (i) A molecule in state m 
might decay to state n while omitting light with the frequency 

hou^E^-E^; (3.12) 

this process is called spontaneous decay, (ii) A molecule might jump from n to m by 
absorbing light with the right frequency (3.12). (iii) A molecule decays from m to n by 
being kicked by light having the right frequency (3.12); this process is called induced 
emission. Thus, there is one transition which happens even without the absence of light; 
with spontaneous emission the molecule jumps from m to n and emits light. The other two 
transitions take place under influence of light; they are therefore dependent of how much 
light is present and thus depends on the radiation-energy density /(tf). 

The probabilities of transitions are given as transition rates dW/ dt; dW is the infinitesimal 
difference in molecules in a certain state and dt is an infinitesimal time interval. Now 
spontaneous emission is independent of the presence of light and only depends on the 
characteristics of the molecule and the number of molecules in state m. Therefore, 

dWi = N{m)A^ndt , 

where dWi is the number of molecules undergoing spontaneous emission from m to n during 
a time interval dt, and where Amn is some number depending on the states n and m (not 
on temperature in particular). We denote dW2 the amount of molecules absorbing light 
and jumping from n to m during a time interval dt and dWs the number of molecules 
jumping from m to n under influence of light (getting the right kick). The probabilities are 
determined by some constants Bjnn and C^„, which are characteristic for the states m and 
n and the amount of hght that has the right frequency. Thus dW2 and dWs are proportional 
to /(a;); 

dW2 = N{n)B^Jdt , dW^ = N{m)Cmnfdt . 

Now we consider we have an enclosed system of molecules that are in equilibrium with the 
light in the system. Being in equilibrium means 

dWi + dWs = dW2 . 

Using the Boltzmann distribution we get 

9me~^{A^n + fCmn) = 9ne'^ fBmn ■ (3.13) 
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Now comes a basic assumption that Einstein does; if T becomes larger the system gets very 
hot and transitions will be more and more frequent. Therefore one assumes that as T — cx) 
that also / — > oo. In this case the exponentials in (3.13) become 1 and the term with A 
can be neglected and we obtain 

Prom another point of view the assumption Einstein makes is natural. The relation 
gmCmn — gnBmn IS representing that the processes m ^ n under induced emission, or 
n ^ m under absorption are symmetric; the numbers Cmn and Bmn only differ by the ratio 
of number of states with energy En to the number of states with energy Em- Indeed, taking 
9m = 9n = ^, the process of induced emission is the time-reversed process of absorption. 
Since the equations in nature show a time-reversal symmetry (in this case) we find in 
this case Cmn — B^n- K now Qn and are not equal one has to correct for this and 
multiply the probabilities with the corresponding multiplicities to get (3.14). With the 
assumption (3.14) we find 

n Amn/ Cmn 

J ~ Q{Em-E„)/kT _ I ■ 

Inserting now Em — En — fiuj and requiring that Wien's law (3.10) holds in the limit where 
uj is large we obtain 

In particular we find that Amn/Cmn = oa;^, which relates the constants Amn and Cmn to 
the energy difference Em — E^- The constant a does not depend on the frequency and the 
temperature. 



Modern derivation. We now discuss a relatively fast derivation that in addition gives a 

value for the constant a in Wien's law (3.10). We consider a box with the shape of a cube 
with sides L. Later we then require that the precise shape of the box is not relevant in the 
limit where the typical sizes are much larger then the wavelength. Then the only relevant 
parameter is the volume V — L^. We assume the walls of the box can absorb and emit 
light; we furthermore assume that the walls are made of a conducting material. Away from 
the walls light satisfies Maxwell equations, but at the walls the perpendicular components 
of the electric field have to vanish; if the electric field would not vanish, the electrons in 
the material of the wall would be accelerated, but then the system is not in equilibrium. A 
plane wave solution to the Maxwell equations is of the form 

^iuit-ikxX-ikyy-ikzZ ^2 _ ^2j-^2 _|_ ^2 _|_ ^2j 

We can always chose a coordinate system that is aligned with the box. Then the boundary 
conditions imply e**^""^ = and thus = for some integer Ux- The wave functions with 
negative are identical to the corresponding wave functions with positive rix', they just 
differ by a phase. Therefore we may assume rix > 0. For the other coordinate directions 
the discussion is similar. 

Thus we find that for each triple of integer numbers n = {nx,ny,nz) we have a harmonic 
oscillator with frequency 

CTT , 
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We now use the fact (proved below in Section 14.3) that for each harmonic oscillator the 
energies are i?.„(r) = hujn{r + ^). Since energy is defined only up to a constant shift, we 
subtract the zero-point energy Eq = \fiujn and take En{r) = a;„r. The partition function is 
then 

oo ^ 
r=0 ^ ^ 

Therefore the average energy in the mode corresponding to n is 

huJr,. 



We now have to sum up all the energies for all modes. Since we are interested in the 
behavior of /(a;) in the regime where the number L is much larger than the wavelength we 
replace the sum over n by an integral. We have to integrate over the positive octant where 
Ux > 0, Uy > and riz > 0. Since all expressions are rotationally symmetric in n, we can 
also integrate over all of and divide by 8. We have not yet taken into account that light 
has two polarizations. Therefore, for each n there are two harmonic oscillators. The total 
energy enclosed in the box is thus 

E = - dnxdnydn^ En ^ n n dn—— — - . 



Here we transformed to polar coordinates and wrote ujn — ^ y^n^ + + n 
We now exchange the integral over n to an integral over uj. We have 

UJ — —— ^ n dn — \ — u) du , 



from which we find 



V TTC / 



L^h r , 

E = / — -duj . 
TT^C^ Jo e'^t^ - 1 



With L^ — V being the volume we thus find 

= (3,15) 

Of course, this / only represents the radiation-energy density inside the black body. How- 
ever, up to some overall constants the above / is the radiation-energy density of a black 
body since the emitted radiation is proportional to the energy density. 

In the following section we shall derive Stefan's law. 



3.7 Stefan's law and Wien's displacement law 

From the calculated density (3.15) we can draw some conclusions, which we now shortly 
treat. 



3.7. STEFAN'S LAW AND WIEN'S DISPLACEMENT LAW 
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To calculate the total radiation that is emitted, we first calculate the total energy by 
integrating (3.15) over all cu. We get for the total energy of the light inside the black body 

E 



TT^c^h^ Jo 

But we have 



X dx 



and thus the energy density u{T) is given by 



15 



"^^^ = v = T^- 

We see that the energy density is expressible by fundamental constants and the fourth 
power of the temperature. Since the energy density determines the total radiation emitted 
per time interval we see that the total energy a black body radiates per time interval is 
proportional to T^. This already explains Stefan's law, but in order to derive Stefan's law 
we have to be a bit more careful. 

In order to see how much a black body will radiate, we pinch a small hole in the black 
body. Let us say that the area of the hole is dA. Now the question is how many photons 
will hit the hole from inside out? We fix a time t and a small time interval dt. Only the 
photons that are within a distance between ct and ct + cdt away from the hole are eligible 
to pass through the hole in a time interval dt after time t. We thus consider a thin shell of 
a half sphere inside the black body a distance ct away from the hole and of thickness cdt. 
Light however spreads in all directions and so not all the photons inside the shell are going 
in the direction of the hole. Our task is to find the ratio of the total that does go through 
the hole. This is a purely geometric question. 

We introduce spherical coordinates around the hole; an angle (p ranging from to 27r 
that goes around the hole, and a polar angle 6 ranging from to 7r/2 (values below zero 
correspond to points outside the black body). We cut the half sphere of radius ct in little 
stripes by cutting for fixed 9 along the angle (p; each stripe is a thin band of thickness ctd'd 
and of length 27rsin^. Consider a little 'cube' of size dV — {ct^ sin 9 d9d(pcdt in the shell. 
The fraction of radiation going in the right direction is given by the solid angle dQ that 
dA describes seen from the little cube. But dO, is given by the projection of the surface dA 
onto the surface of the sphere of radius ct around the little cube: 

,^ dA cos 9 
dU = — :—- . 

AttcH^ 

The cube of volume emits all the radiation present in the cube (since the light waves just 
pass through), and that amounts to an energy dE = u{T)dV. From the little cube under 
consideration the amount of radiation going in the right direction is thus 

,^ „ , cdA cos 6* sin ^ ,^ , , 
udQdV = u{T) d9d(pdt . 

4:71 

Note that the amount of radiation is independent of the radius of the half sphere. Since 
the question is of a purely geometric nature, that is to be expected. We now get the total 
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amount of radiation from summing up all the contributions: Denoting by dU the energy 
that leaves the hole during the time interval dt, we have 

dU r'"^ f^" , ,^,cdAcoses\ne 

dO / d(pu[T)- 



dt Jo Jo 47r 
= luiT)dA. 

For a black body that radiates over all its surface, and not only through one little hole, we 
sum up the contributions over all little surfaces dA. In order that the above analysis still 
holds the shape of the black body needs to be such that radiation that exits the black body 
does not enter again. If the black body is convex this requirement is met, e.g., we could 
take a sphere. We then find Stefan's law in the form given by 

dt 60^^c2 

with Stefan's constant 

We now turn to Wien's displacement law. We write the radiation-energy density as 

f(uj) ^A ^i" . 

Differentiation with respect to uj and putting the result to zero to obtain the position of 
the maximum gives the equations 

3 X — 3e , X — hojjjiaxP ■ 

We discard the trivial solution x — since this corresponds to the behavior at a; = 0. One 
finds the other solution by solving the equation 3 — x — 3e~^ with numerical methods and 
finds X ~ 2.82. Hence we have 

2.82 T 

nk 



Part II 



Statistical mechanics 
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Chapter 4 



Phenomenological thermodynamics 



Part II discusses statistical mechanics from an algebraic perspective, concentrating on ther- 
mal equilibrium but discussing basic things in a more general framework. A treatment 
of equilibrium statistical mechanics and the kinematic part of nonequilibrium statistical 
mechanics is given which derives from a single basic assumption (Definition 6.1.1) the full 
structure of phenomenological thermodynamics and of statistical mechanics, except for the 
third law which requires an additional quantization assumption. 

This chapter gives a concise description of standard phenomenological equilibrium thermo- 
dynamics for single-phase systems in the absence of chemical reactions and electromagnetic 
fields. From the formulas provided, it is an easy step to go to various examples and ap- 
plications discussed in standard textbooks such as Callen [49] or Reichl [205]. A full 
discussion of global equilibrium would also involve the equilibrium treatment of multiple 
phases and chemical reactions. Since their discussion offers no new aspects compared with 
traditional textbook treatments, they are not treated here. 

Our phenomenological approach is similar to that of C ALLEN [49] , who introduces the basic 
concepts by means of a few postulates from which everything else follows. The present 
setting is a modified version designed to match the more fundamental approach based on 
statistical mechanics. By specifying the kinematical properties of states outside equilibrium, 
his informal thermodynamic stability arguments (which depends on a dynamical assumption 
close to equilibrium) can be replaced by rigorous mathematical arguments. 



4.1 Standard thermodynamical systems 

We discuss here the special but very important case of thermodynamic systems describing 
the single-phase global equilibrium of matter composed of one or several kinds of substances 
in the absence of chemical reactions and electromagnetic fields. We call such systems 
standeird thermodynamic systems; they are ubiquitous in applications. In particular. 
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a standard system is considered to be uncharged, homogeneous, and isotropic, so that each 
finite region looks hke any other and is very large in microscopic units. 

The substances of fixed chemical composition are labeled by an index j e J. A stan- 
dard thermodynamic system is completely characterized by^ the mole number A^^ of each 
substance j, the corresponding chemical potential fij of substance j, the volume V, 
the pressure P, the temperature T, the entropy S, and the Hamilton energy H. 
These variables, the extensive variables Nj, V, S, H and the intensive variables /xj, P, T, 
are jointly called the basic thermodynamic variables. We group the Nj and the 
into vectors N and indexed by J and write ji ■ N = YljeJ l^i-^i- special case of 

a pure substance, there is just a single kind of substance; then we drop the indices and 
have jj, ■ N = nN. In this section, all numbers are real. 

The mathematics of thermodynamics makes essential use of the concept of convexity. A 
set X C M"- is called convex if tx + {1 - t)y G X for all x,y e X and all t G [0, 1]. A 
real- valued function 4> is called convex in the convex set X C M"^ if is defined on X and, 
for all x,y e X, 

4){tx + (1 - t)y) < t(f){x) + (1 - for < i < 1. 

Clearly, is convex iff for all x,y E X , the function : [0, 1] — > M defined by 

li{t) := (f){x + t{y - x)) 

is convex. It is well-known that, for twice continuously differentiable 0, this is the case 
iff the second derivative n"{t) is nonnegative for < t < 1. Note that by a theorem of 
Aleksandrov (see Aleksandrov [4], Alberti & Ambrosio [2], Rockafellar [213]), 
convex functions are almost everywhere twice continuously differentiable: For almost every 
X G X, there exist a unique vector d(f){x) G M", the gradient of </> at a;, and a unique 
symmetric, positive semidefinite matrix d'^(j){x) G M'*^'*, the Hessian of at x, such that 

(j){x + h) = (l){x) + h^d(l){x) + lh^d^(j){x)h + o(||/i|n 

for sufficiently small /i G R". A function is called concave if —0 is convex. Thus, for a 
twice continuously differentiable function of a single variable r, is concave iff /^"(t) < 
for < r < 1. 

4.1.1 Proposition. If is convex in the convex set X then the function ip dehned by 

ip{s, x) :— s(f){x/s) 

is convex in the set {(s, x) G MxX | s > 0} and concave in the set {(s, x) G RxX | s < 0}. 

^In the terminology, we mainly follow the lUPAC convention (Alberty [3, Section 7]), except that 
we use the letter H to denote the Hamilton energy, as customary in quantum mechanics. In equilibrium, 
H equals the internal energy U. The Hamilton energy should not be confused with the enthalpy which 
is usually denoted by H but here is given in equilibrium hy H + PV . For a history of thermodynamics 
notation, see Battino et al. [26]. 
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Proof. It suffices to show that fi{t) := iIj{s + tk,x + th) is convex (concave) for all s, x, h, k 
such that s + tk > (resp. < 0). Let z{t) :— {x + th)/{s + tk) and c := sh — kx. Then 

^'(^) = (s + tk)^ ' ^^^^ ^^'^ tk)cf>{z{t)), 

hence 

tx'it) = k<t>{z{t)) + <t>'{m) ' 



s + tk' 



which has the required sign. □ 



Equilibrium thermodynamics is about characterizing so-called equilibrium states in terms 
of intensive and extensive variables and their relations, and comparing them with similar 
nonequilibrium states. In a nonequilibrium state, only extensive variables have a well- 
defined meaning; but these are not sufficient to characterize system behavior completely. 

All valid statements in the equilibrium thermodynamics of standard systems can be deduced 
from the following definition. 

4.1.2 Definition. (Phenomenological thermodynamics) 

(i) Temperature T, pressure P, and volume V are positive, mole numbers Nj are nonneg- 
ative. The extensive variables H, S, V, N are additive under the composition of disjoint 
subsystems. 

(ii) There is a convex system function A of the intensive variables T, P, jj, which is mono- 
tone increasing in T and monotone decreasing in P, such that the intensive variables are 
related by the equation of state 

A(T,P,/.) = 0. (4.1) 

The set of (T, P, /x) satisfying T > 0, P > and the equation of state is called the state 
space. 

(iii) The Hamilton energy H satisfies the Euler inequality 

H >TS - PV + II- N (4.2) 

for all (T, P, fi) in the state space. Equilibrium states have well-defined intensive and 
extensive variables satisfying equality in (4.2). A system is in equilibrium if it is completely 
described by an equilibrium state. 

This is the complete list of assumptions defining phenomenological equilibrium thermody- 
namics; the system function A can be determined either by fitting to experimental data, 
or by calculation from a more fundamental description, cf. Theorem 6.2.1. All other prop- 
erties follow from the system function. Thus, all equilibrium properties of a material are 
characterized by the system function A. 
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Surfaces of nondifferentiability of the system function correspond to so-called phase tran- 
sitions. The equation of state shows that, apart from possible phase transitions, the state 
space has the structure of an (s — l)-dimensional manifold in W, where s is the number 
of intensive variables; in case of a standard system, the dimension is therefore one higher 
than the number of kinds of substances. 

Thermodynamic systems with multiple phases are only piecewise homogeneous; each phase 
separately may be described as a standard thermodynamic system, but discussing the equi- 
librium at interfaces needs some additional effort, described in all textbooks on thermody- 
namics. Therefore, we consider only regions of state space where the system function A is 
twice continuously differentiable. 



Each equilibrium instance of the material is characterized by a particular state {T,P,N), 
from which all equilibrium properties can be computed: 

4.1.3 Theorem. 



(i) In any equilibrium state, the extensive variables are given by 

OA OA OA 

s = n—{T,p,i,), v = -n—{T,p,i,), N = n—{T,p,i,), 

and the Euler equation 

H = TS - PV + II- N. 
Here il is a positive number called the system size. 

(a) In equilibrium, we have the McLXwell reciprocity relations 

_dV _dS dNj _ dS dNj _ _dV dNj _ dN^ 
~df~dP' 'dT'diTj' 'dP~~^j' 'djli^ ~ ~dil~ 

and the stability conditions 

|^>o, |K<o. f^>0. 



(4.3) 
(4.4) 



(4.5) 



(4.6) 



Proof. At fixed S, V, N, inequality (4.2) holds in equilibrium with equality. Therefore the 
triple (T, P, fi) is a maximizer of TS — PV + n ■ N under the constraints A(T, P, /x) = 0, 
T > 0, P > 0. A necessary condition for a maximizer is the stationarity of the Lagrangian 

L(T, P,i2) =TS - PV + fi- N - nA{T, P, /x) 

for some Lagrange multiplier Q. Setting the partial derivatives to zero gives (4.3), and since 
the maximum is attained in equilibrium, the Euler equation (4.4) follows. The system size 
Q is positive since V > and A is decreasing in P. Since the Hessian matrix of A, 



/ d^A d^A d^A \ 

dndT 
d'^A 

d/jdP 
d^A 



d'^A 
dTdP 

d^A 
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d^A 
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is symmetric, the Maxwell reciprocity relations follow. Since A is convex, S is positive 
semidefinite; hence the diagonal elements of E are nonnegative, giving the stability condi- 
tions. □ 



Note that there are further stability conditions since the determinants of all principal sub- 
matrices of S must be nonnegative. In addition, Nj > implies that A is monotone 
increasing in each fij. 

4.1.4 Example. The equilibrium behavior of electrically neutral gases at sufficiently low 
pressure can be modelled as ideal gases. An ideal gas is defined by a system function of 
the form 

A(r, p,^i)^Yl ^A^y'^"^ - P, (4-7) 

where the i^jiT) are positive functions of the temperature, 

R ^ 8.31447 JK-^mor^ (4.8) 

is the universal gas constant^, and we use the bracketing convention /ij/RT = iij/{RT). 
Differentiation with respect to P shows that fl — V is the system size, and from (4.1), 
(4.3), and equality in (4.2), we find that, in equilibrium, 

j j 

j 

Expressed in terms of T,V,N, we have 

pv = rtY,n,, ^^^^^^^^T^T^' 

H = 5]/i,(T)Ar., h^(T) = i?T (r— log 7r,(r) - l), 
j 

from which S can be computed by means of the Euler equation (4.4). In particular, for one 
mole of a single substance, defined by = 1, we get the ideal gas law 

PV = RT (4.9) 

discovered by Clapeyron [56]; cf. Jensen [126]. 

In general, the difference hj{T) — hj{T') can be found experimentally by measuring the 
energy needed for raising or lowering the temperature of pure substance j from T' to T 
while keeping the Nj constant. In terms of infinitesimal increments, the heat capacities 

Cj{T) = dhj{T)/dT, 



^For the internationally recommended values of this and other constants, their accuracy, determination, 
and history, see COD ATA [59]. 
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we have 




dTCj{T). 



From the definition of hj{T), we find that 



^jiT) =TTj{T')exp 




Thus there are two undetermined integration constants for each kind of substance. These 
cannot be determined experimentally as long as we are in the range of validity of the 
ideal gas approximation. Indeed, if we pick arbitrary constants aj and 7^ and replace 
Trj{T), /ij, H, and S by 



all relations remain unchanged. Thus, the Hamilton energy and the entropy of an ideal 
gas are only determined up to an arbitrary linear combination of the mole numbers. This 
is an instance of the deeper problem to determine under which conditions thermodynamic 
variables are controllable; cf. the discussion in the context of Example 7.1.1 below. 

The gauge freedom (present only in the ideal gas) can be fixed by choosing a particular 
standaird temperature Tq and setting arbitrarily hj{To) = 0, /Xj(To) — 0. Alternatively, 
at sufficiently large temperature T, heat capacities are usually nearly constant, and making 
use of the gauge freedom, we may simply assume that 



Note that this gauge freedom is present only for ideal gases. 



4.2 The laws of thermodynamics 

The possibility of measuring temperature is sometimes called the zeroth law of thermo- 
dynamics (Fowler & Guggenheim [80]). For the history of temperature, see Roller 
[215] and Truesdell [243]. The ideal gas law (4.9) is the basis for the construction of a 
gas thermometer: The amount of expansion of volume in a long, thin tube can easily be 

read off from a scale along the tube. We have V = aL, where a is the cross section area 
and L is the length of the filled part of the tube, hence T = {aP/R)L. Thus, at constant 
pressure, the temperature of the gas is proportional to L. 

We say that two thermodynamic systems are brought in good thermal contact if the joint 
system tends after a short time to an equilibrium state. To measure the temperature of a 
system, one brings it in thermal contact with a thermometer and waits until equilibrium is 
established. The system and the thermometer will then have the same temperature, which 



H' 



= e'^-^^/^^7r,-(T), = + 7,- - RTa_ 



j 3 



hj{T) = hjoT, TijiT) = TijoT for large T. 
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can be read off from tfie tfiermometer. If the system is much larger than the thermometer, 
this temperature will be essentially the same as the temperature of the system before the 
measurement. For a survey of the problems involved in defining and measuring temperature 
outside equilibrium, see Casas- Vasquez & Jou [52] . 



To be able to formulate the first law of thermodynamics we need the concept of a reversible 
change of states, i.e., changes preserving the equilibrium condition. For use in later sections, 
we define the concept in a slightly more general form, writing a for P and /i jointly. 

4.2.1 Definition. A state variable is an almost everywhere continuously differentiable 
function 0(T, a) defined on the state space (or a subset of it). Temporal changes in a state 
variable that occur when the boundary conditions are kept fixed are called spontaneous 
changes. A reversible transformation is a continuously differentiable mapping 

A^(T(A),a(A)) 

from a real interval into the state space; thus A(T'(A), a{X)) — 0. The differential 

d<t>=^dT+^-da, (4.10) 

obtained by multiplying the chain rule by d\, describes the change of a state variable 
under arbitrary (infinitesimal) reversible transformations. In formal mathematical terms, 
differentials are exact linear forms on the state space manifold; cf. Chapter 11. 

Reversible changes per se have nothing to do with changes in time. However, by sufficiently 
slow, quasistatic changes of the boundary conditions, reversible changes can often be real- 
ized approximately as temporal changes. The degree to which this is possible determines 
the efficiency of thermodynamic machines. The analysis of the efficiency by means of the 
so-called Ceirnot cycle was the historical origin of thermodynamics. 

The state space is often parameterized by different sets of state variables, as required by the 
application. If T = T(«;, A), o; = «(«;, A) is such a parameterization then the state variable 
5'(T, a) can be written as a function of (k, A), 

g{K,X)^g{T{K,X),a{K,X)). (4.11) 

This notation, while mathematically ambiguous, is common in the literature; the names of 
the argument decide which function is intended. When writing partial derivatives without 

arguments, this leads to serious ambiguities. These can be resolved by writing ( — ) for 

\OX/ K. 

the partial derivative of (4.11) with respect to A; it can be evaluated using (4.10), giving 
the chain rule 

(drx ^dg_rdT\ ^g_ f^a^ 

KdXJK dT\dXJn da V9A/«' ^' ^ 

Here the partial derivatives in the original parameterization by the intensive variables are 
written without parentheses. 
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Differentiating the equation of state (4.1), using the chain rule (4.10), and simphfying using 
(4.3) gives the Gibbs-Duhem equation 

O^SdT-VdP + N-dii (4.13) 

for reversible changes. If we differentiate the Euler equation (4.4), we obtain 

dH = TdS + SdT - PdV - VdP + ii ■ dN + N ■ d/i, 

and using (4.13), this simplifies to the first law of thermodynamics 

dH = TdS - PdV + i^-dN. (4. 14) 

Historically, the first law of thermodynamics took on this form only gradually, through 
work by Mayer [168], Joule[128], Helmholtz[112], and Clausius [57]. 

Considering global equilibrium from a fundamental point of view, the extensive variables 
are the variables that are conserved or at least change so slowly that they may be regarded 
as time independent on the time scale of interest. In the absence of chemical reactions, the 
mole numbers, the entropy, and the Hamilton energy are conserved; the volume is a system 
size variable which, in the fundamental view, must be taken as infinite (thermodynamic 
limit) to exclude the unavoidable interaction with the environment. However, real systems 
are always in contact with their environment, and the conservation laws are approximate 
only. In thermodynamics, the description of the system boundary is generally reduced to 
the degrees of freedom observable at a given resolution. 

The result of this reduced description (for derivations, see, e.g., Balian [17], Grabert [97], 
Rau & MuLLER [203]) is a dynamical effect called dissipation (Thomson [241]). It is 
described by the second law of thermodynamics, which was discovered by (Clausius 
[58]. The Euler inequality (4.2) together with the Euler equation (4.4) only express the 
nondynamical part of the second law since, in equilibrium thermodynamics, dynamical 
questions are ignored: Axiom (ii) says that if S, V, N are conserved (thermal, mechanical 
and chemical isolation) then the internal energy, 

U :=TS - PV + iJ- N (4.15) 

is minimal in equilibrium; if T, V, N are conserved (mechanical and chemical isolation of a 
system at constant temperature T) then the Helmholtz (free) energy, 

A:^U-TS^ -PV + 

is minimal in equilibrium; and if T, P, N are conserved (chemical isolation of a system at 
constant temperature T and pressure P) then the Gibbs (free) energy, 

G :^ A + PV ^ 1^- N 

is minimal in equilibrium. 

The third law of thermodynamics, due to Nernst [181], says that entropy is nonneg- 
ative. In view of (4.3), this is equivalent to the monotonicity of A(T, P, n). 
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4.3 Consequences of the first law 

The first law of thermodynamics describes the observable energy balance in a reversible 
process. The total energy flux dH into the system is composed of the thermal energy 
flux or heat flux TdS, the mechanical energy flux —PdV, and the chemical energy 
flux fi ■ dN. 

The Gibbs-Duhem equation (4.13) describes the energy balance necessary to compensate 
the changes d{TS) = TdS + SdT of thermal energy, d{PV) = PdV + VdP of mechanical 
energy, and d{iJ, ■ N) = /j, ■ dN + N ■ d/j, oi chemical energy in the energy contributions 
to the Euler equation to ensure that the Euler equation remains valid during a reversible 
transformation. Indeed, both equations together imply that d{TdS — PdV + /i ■ N — H) 
vanishes, which expresses the preservation of the Euler equation. 

Related to the various energy fluxes are the thermal work 



performed in a reversible transformation. The various kinds of work generally depend on 
the path through the state space; however, the mechanical work depends only on the end 
points if the associated process is conservative. 

As is apparent from the formulas given, thermal work is done by changing the entropy of 
the system, mechanical work by changing the volume, and chemical work by changing the 
mole numbers. In particular, in case of thermal, mechanical, or chemical isolation, the 
corresponding fluxes vanish identically. Thus, constant S characterizes thermally isolated, 
adiabatic systems, constant V characterizes mechanically isolated, closed^ systems, and 
constant characterizes chemically isolated, impermeable systems. Note that this con- 
stancy only holds when all assumptions for a standard system are valid: global equilibrium, 
a single phase, and the absence of chemical reactions. Of course, these boundary condi- 
tions are somewhat idealized situations, which, however, can be approximately realized in 
practice and are of immense scientiflc and technological importance. 

The flrst law shows that, in appropriate units, the temperature T is the amount of energy 
needed to increase in a mechanically and chemically isolated system the entropy S by one 
unit. The pressure P is, in appropriate units, the amount of energy needed to decrease 
in a thermally and chemically isolated system the volume V by one unit. In particular, 
increasing pressure decreases the volume; this explains the minus sign in the definition of P. 




the mechanical work 




and the chemical work 




^Note that the terms 'closed system' has also a much more general interpretation - which we do not use 
in this chapter -, namely as a conservative dynamical system. 
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The chemical potential Hj is, in appropriate units, the amount of energy needed to increase 
in a thermally and mechanically isolated system the mole number A'^- by one. With the 
traditional units, temperature, pressure, and chemical potentials are no longer energies. 

We see that the entropy and the volume behave just like the mole number. This analogy 
can be deepened by observing that mole numbers are the natural measure of the amounts 
of "matter" of each kind in a system, and chemical energy flux is accompanied by adding 
or removing matter. Similarly, volume is the natural measure of the amount of "space" 
a system occupies, and mechanical energy flux in a standard system is accompanied by 
adding or removing space. Thus we may regard entropy as the natural measure of the 
amount of "heat" contained in a system"^, since thermal energy flux is accompanied by 
adding or removing heat. Looking at other extensive quantities, we also recognize energy 
as the natural measure of the amount of " power" , momentum as the natural measure of the 
amount of "force" , and mass as the natural measure of the amount of "inertia" of a system. 
In each case, the notions in quotation marks are the colloquial terms which are associated 
in ordinary life with the more precise, formally defined physical quantities. For historical 
reasons, the words heat, power, and force are used in physics with a meaning different from 
the colloquial terms "heat", "power", and "force". 



4.4 Consequences of the second law 

The second law is centered around the impossibility of perpetual motion machines due to 
the inevitable loss of energy by dissipation such as friction (see, e.g., Bowden & Leben 
[44]), uncontrolled radiation, etc.. This means that - unless continually provided from the 
outside - energy is lost with time until a metastable state is attained, which usually is an 
equilibrium state. Therefore, the energy at equilibrium is minimal under the circumstances 
dictated by the boundary conditions. In a purely kinematic setting as in our treatment, 
the approach to equilibrium cannot be studied, and only the minimal energy principles - 
one for each set of boundary conditions - remain. 

Traditionally, the second law is often expressed in the form of an extremal principle for 
some thermodynamic potential. We derive here the extremal principles for the Hamilton 
energy, the Helmholtz energy, and the Gibbs energy^, which give rise to the Hamilton 

^Thus, entropy is the modern replacement for the historical concepts of phlogiston and caloric, which 
failed to give a correct account of heat phenomena. Phlogiston turned out to be 'missing oxygen', an 
early analogue of the picture of positrons as holes, or 'missing electrons', in the Dirac sea. Caloric was a 
massless substance of heat which had almost the right properties, explained many effects correctly, and fell 
out of favor only after it became known that caloric could be generated in arbitrarily large amounts from 
mechanical energy, thus discrediting the idea of heat being a substance. (For the precise relation of entropy 
and caloric, see Kuhn [145, 146], Walter [255], and the references quoted there.) In the modern picture, 
the extensivity of entropy models the substance- like properties of heat. But as there are no particles of 
space whose mole number is proportional to the volume, so there are no particles of heat whose mole 
number is proportional to the entropy. Nevertheless, the introduction of heat particles on a formal level 
has some uses; sec, e.g., Streater [236]. 

^The different potentials are related by so-called Legendre transforms; cf. Rockafellar [214] for the 
mathematical properties of Legendre transforms, Arnol'd [13] for their application in mechanics, and 
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potential 



U{S,V,N) —maxiTS - PV + n- N \ A(r, P, /x) = 0; T > 0; P > 0}, 

T,P,fi 



the Helmholtz potential 



A{T, V, N) := max{-Py + /i ■ N \ A{T, P, /x) = 0; T > 0; P > 0}, 

and the Gibbs potential 

G(T, P, N) := max{// • N \ A(T, P, /x) = 0; T > 0; P > 0}. 

The Gibbs potential is of particular importance for everyday processes since the latter 
frequently happen at approximately constant temperature, pressure, and mole number. 
(For other thermodynamic potentials used in practice, see Alberty [3] ; for the maximum 
entropy principle, see Section 7.7.) 

4.4.1 Theorem. (Extremal principles) 

(i) In an arbitrary state, 

H>U{S,V,N), (4.16) 

with equality iS the state is an equihbrium state. The remaining thermodynaniic variables 
are then given by 

r\ 

T=—U{S,V,N), P = -—U{S,V,N), ^ = —UiS,V,N), H = U{S,V,N). 
In particular, an equilibrium state is uniquely determined by the values of S, V, and N. 
(a) In an arbitrary state, 

H -TS >A{T,V,N), (4.17) 

with equality iff the state is an equilibrium state. The remaining thermodynamic variables 
are then given by 

f)A f)A r)A 

S^- — {T,V,N), P=- — iT,V,N), f, = —{T,V,N), 

H = A(T, V, N) + TS. 
In particular, an equilibrium state is uniquely determined by the values of T, V, and N. 

(Hi) In an arbitrary state, 

H -TS + PV >G{T,P,N), (4.18) 

with equality iff the state is an equilibrium state. The remaining thermodynamic variables 
are then given by 

r)G f)G f)G 

S^- — {T,P,N), V=—{T,P,N), „=—{T,P,N), 

H = G{T, P, N)+TS- PV. 
In particular, an equilibrium state is uniquely determined by the values of T, P, and N. 



Alberty [3] for their application in chemistry. 
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Proof. We prove (ii); the other two cases are entirely similar. (4.17) and the statement about 
equality is a direct conseqimce of Axiom 4.1.2(ii). Thus, the difference H — TS — A{T, V. N) 
takes its minimum value zero at the equilibrium value of T. Therefore, the derivative with 
respect to T vanishes, which gives the formula for S. To get the formulas for P and we 
note that for constant T, the first law (4.14) imphes 

dA - d{H - TS) ^dH - TdS = -PdV + i^-dN. 

For the reversible transformation which only changes P or nj, we conclude that dA — —PdV 
and dA — /i ■ dN, respectively. Solving for P and /ij, respectively, implies the formulas for 
P and /Ij. n 

The above results imply that one can regard each thermodynamic potential as a complete 
alternative way to describe the manifold of thermal states and hence all equilibrium prop- 
erties. This is very important in practice, where one usually describes thermodynamic 
material properties in terms of the Helmholtz or Gibbs potential, using models like NRTL 
(Renon & Prausnitz [206], Prausnitz et al. [199]) or SAFT (Chapman et al. [53, 54]). 

The additivity of extensive quantities is reflected in corresponding properties of the ther- 
modynamic potentials: 

4.4.2 Theorem. The potentials U{S,V,N), A{T,V,N), and G{T,P,N) satisfy, for real 
A,A\A2 > 0, 

U(XS, XV, XN) = XU(S, V, N), (4.19) 
A{T, XV, XN) = XA{T, V, N), (4.20) 
G{T, P, XN) = XG{T, P, N), (4.21) 
U(X^S^ + X^S"", A + AV^ X^N^ + X^N^) < X^U{S\ V\ N^) + X^U{S\ V^, N^), (4.22) 
A{T, AV^ + AV^, X^N^ + A^A^^) < x^A{T, V^, N^) + X^A{T, V^, N^), (4.23) 
G{T, P, X^N^ + X^N'^) < X^G{T, P, N^) + X^G{T, P, N^). (4.24) 
In particular, these potentials are convex in S, V, and N. 



Proof. The first three equations express homogeneity and are a direct consequence of the 
definitions. Inequahty (4.23) holds since, for suitable P and 

A(T,X'V' + X^V^,X'N' + X^N^) = -P{X'V' + X^V^) + ■ {X'N' + X^N^) 

= X\-PV^ + fi-N^) + X\-PV^ + n-N'^) 
< X^A{T,V\N^) + X''A{T,V^,N^); 

and the others follow in the same way. Specialized to A^ + A^ = 1, the inequalities express 
the claimed convexity. □ 
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For a system at constant temperature T, pressure P, and mole number N, consisting of a 
number of parts labeled by a superscript k which are separately in equilibrium, the Gibbs 
energy is extensive, since 



G = H -TS + PV + 
= J2iH'' -TS'^ + PV'') = J2g''- 

Equilibrium requires that ^ G'' is minimal among all choices with ^ N'' = N, and by 
introducing a Lagrange multiplier vector n* for the constraints, we see that in equilibrium, 
the derivative of ^(G(T, P, N'') — jj,* ■ N'^) with respect to each N'^ must vanish. This 
implies that 

8G 

^^'=9];^(T,P,N^)=,*. 

Thus, in equilibrium, all fi'' must be the same. At constant T, V, and A^, one can apply the 
same argument to the Helmholtz potential, and at constant S, V, and N to the Hamilton 
potential. In each case, the equilibrium is characterized by the constancy of the intensive 
parameters. 

The degree to which macroscopic space and time correlations are absent characterizes the 
amount of macroscopic disorder of a system. Global equihbrium states are therefore 
macroscopically highly uniform; they are the most ordered macroscopic states in the uni- 
verse rather than the most disordered ones. A system not in global equilibrium is charac- 
terized by macroscopic local inhomogeneities, indicating that the space-independent global 
equilibrium variables alone are not sufficient to describe the system. Its intrinsic complexity 
is apparent only in a microscopic treatment; cf. Section 7.6 below. The only macroscopic 
shadow of this complexity is the critical opalescence of fluids near a critical point (An- 
drews [11], FoRSTER [79]). The contents of the second law of thermodynamics for global 
equilibrium states may therefore be phrased informally as follows: In global equilibrium, 
macroscopic order (homogeneity) is perfect and microscopic complexity is maximal. In 
particular, the traditional interpretation of entropy as a measure of disorder is often mis- 
leading. Much more carefully argued support for this statement, with numerous examples 
from teaching practice, is in Lambert [150]. 

4.4.3 Theorem. (Entropy form of the second law) 
In an arbitrary state of a standard thermodynamic system 

S < S{H, V, N) := min {T-\H + PV - ■ N) \ A{T, P, ^) = 0}, 

with equahty iff the state is an equihbrium state. The remaining thermal variables are then 
given by 

f)Q f)Q fiQ 

T-'^—{H,V,N), T-'P^^{H,V,N), T-'^ ^ -^{H,V, N), (4.25) 
U = H = TS{T, V,N) - PV + n- N. (4.26) 

Proof. This is proved in the same way as Theorem 4.4.1. □ 
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This result implies that when a system in which H. V and are kept constant reaches 
equilibrium, the entropy must have increaed. Unfortunately, the assumption of constant H, 
V and N is unrealistic; such constraints are not easily realized in nature. Under different 
constraints^, the entropy is no longer maximal. 

In systems with several phases, a naive interpretation of the second law as moving systems 
towards increasing disorder is even more inappropriate: A mixture of water and oil sponta- 
neously separates, thus "ordering" the water molecules and the oil molecules into separate 
phases! 

Thus, while the second law in the form of a maximum principle for the entropy has some 
theoretical and historical relevance, it is not the extremal principle ruling nature. The 
irreversible nature of physical processes is instead manifest as energy dissipation which, 
in a microscopic interpretation, indicates the loss of energy to the unmodelled microscopic 
degrees of freedom. Macroscopically, the global equilibrium states are therefore states of 
least free energy, the correct choice of which depends on the boundary condition, with 
the least possible freedom for change. This macroscopic immutability is another intuitive 
explanation for the maximal macroscopic order in global equilibrium states. 

4.5 The approach to equihbrium 

Using only the present axioms, one can say a little bit about the behavior of a system close 
to equilibrium in the following, idealized situation. Suppose that a system at constant S, 
V, and N which is close to equilibrium at some time t reaches equilibrium at some later 
time t*. Then the second law implies 



so that dH/dt < 0. We assume that the system is composed of two parts, which are both 
in equilibrium at times t and t*. Then the time shift induces on both parts a reversible 
transformation, and the first law can be applied to them. Thus 



Since S, V, and N remain constant, we have dS^ + dS'^ = 0, dV^ + dV'^ = 0, dN^ + dN'^ = 0, 
and since for the time shift dH < 0, we find the inequality 

> (T^ - T^)dS^ - (P^ - P^)dV^ + (/x^ - /x^) • dN\ 



^For example, if one pours milk into a cup of coffee, stirring mixes coffee and milk, thus increasing 
complexity. Macroscopic order is restored after some time when this increased complexity has become 
macroscopically inaccessible. Since T,P and N are constant, the cup of coffee ends up in a state of 
minimal Gibbs energy, and not in a state of maximal entropy! More formally, the first law shows that, for 
standard systems at fixed value of the mole number, the value of the entropy decreases when H oi V (or 
both) decrease reversibly; this shows that the value of the entropy may well decrease if accompanied by a 
corresponding decrease of iJ or The same holds out of equilibrium (though our equilibrium argument 
no longer applies); for example, the reaction 2 H2 + O2 ^ 2 H2O (if catalyzed) may happen spontaneously 
at constant T = 25 °C and P = 1 atm, though it decreases the entropy. 



< H{t) - H{t*) ^{t- r) 



dH 
It' 




k=l,2 



k=l,2 
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This inequality gives infromation about the direction of the flow in case that all but one of 
the extensive variables are known to be fixed. 

In particular, at constant and N^, we have dS^ < ii > T'^; i.e., "heat" (entropy) 
flows from the hotter part towards the colder part. At constant and N^, we have dV^ < 
if < P^; i.e., "space" (volume) flows from lower pressure to higher pressure: the volume 
of the lower pressure part decreases and is compensated by a corresponding increase of the 
volume in the higher pressure part. And for a pure substance at constant and V^, we 
have dN^ < if //^ > //^; i.e., "matter" (mole number) flows from higher chemical potential 
towards lower chemical potential. These qualitative results give temperature, pressure, and 
chemical potential the familiar intuitive interpretation. 

This glimpse on nonequilibrium properties is a shadow of the far reaching fact that, in 
nonequilibrium thermodynamics, the intensive variables behave like potentials whose gra- 
dients induce forces that tend to diminish these gradients, thus enforcing (after the time 
needed to reach equilibrium) agreement of the intensive variables of different parts of a 
system. In particular, temperature acts as a thermal potential, whose differences create 
thermal forces which induce thermal currents, a flow of "heat" (entropy), in a similar way 
as differences in electrical potentials create electrical currents, a flow of "electricity" (elec- 
trons)^. While these dynamical issues are outside the scope of the present work, they 
motivate the fact that one can control some intensive parameters of the system by con- 
trolling the corresponding intensive parameters of the environment and making the walls 
permeable to the corresponding extensive quantities. This corresponds to standard proce- 
dures familiar to everyone from ordinary life, such as: heating to change the temperature; 
applying pressure to change the volume; immersion into a substance to change the chemi- 
cal composition; or, in the more general thermal models discussed in Section 7.1, applying 
forces to displace an object. 

The stronger nonequilibrium version of the second law says that (for suitable boundary 
conditions) equilibrium is actually attained after some time (stictly speaking, only in the 
limit of infinite time). This implies that the energy difference 

SE:^H- U{S, V,N) ^ H -TS - A{S, V, N) ^ H - TS + PV = G{S, V, N) 

is the amount of energy that is dissipated in order to reach equilibrium. In an equilibrium 

setting, we can only compare what happens to a system prepared in a nonequilibrium state 
assuming that, subsequently, the full energy difference 6E is dissipated so that the system 
ends up in an equilibrium state. Since few variables describe everything of interest, this 
constitutes the power of equilibrium thermodynamics. But this power is limited, since equi- 
librium thermodynamics is silent about when - or whether at all - equilibrium is reached. 
Indeed, in many cases, only metastable states are reached, which change too slowly to ever 
reach equilibrium on a human time scale. 

Description levels. As we have seen, extensive and intensive variables play completely 
different roles in equilibrium thermodynamics. Extensive variables such as mass, charge, or 

^See Table 7.1 for more parallels in other thermodynamic systems, and FuCHS [83] for a thermodynamics 
course (and for a German course Job [127]) thoroughly exploiting these parallels. 
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volume depend additively on the size of the system. The conjugate intensive variables act 
as parameters defining the state. 

A system composed of many small subsystems, each in equilibrium, needs for its complete 
characterization the values of the extensive and intensive variables in each subsystem. Such 
a system is in global equilibrium only if its intensive variables are independent of the 
subsystem. On the other hand, the values of the extensive variables may jump at phase 
space beoundaries, if (as is the case for multi-phase systems) the equations of state allow 
multiple values for the extensive variables to correspond to the same values of the intensive 
variables. If the intensive variables are not independent of the subsystem then, by the 
second law, the differences in the intensive variables of adjacent subsystems give rise to 
thermodynamic forces trying to move the system towards equilibrium. 

A real nonequilibrium system does not actually consist of subsystems in equilibrium; how- 
ever, typically, smaller and smaller pieces behave more and more like equilibrium systems. 
Thus we may view a real system as the continuum limit of a larger and larger number of 
smaller and smaller subsystems, each in approximate equilibrium. As a result, the extensive 
and intensive variables become fields depending on the continuum variables used to label 
the subsystems. For extensive variables, the integral of their fields over the label space gives 
the bulk value of the extensive quantity; thus the fields themselves have a natural inter- 
pretation as a density. For intensive variables, an interpretation as a density is physically 
meaningless; instead, they have a natural interpretation as field strengths. The gradients 
of their fields have physical significance as the sources for thermodynamic forces. 

From this fied theory perspective, the extensive variables in the single-phase global equilib- 
rium case have constant densities, and their bulk values are the densities multiplied by the 
system size (which might be mass, or volume, or another additive parameter), hence scale 
linearly with the size of the system, while intensive variables arc invariant under a change 
of system size. We do not use the alternative convention to call extensive any variable that 
scales linearly with the system size, and intensive any variable that is invariant under a 
change of system size. 

We distinguish four nested levels of thermal descriptions, depending on whether the system 
is considered to be in global, local, microlocal, or quantum equilibrium. The highest and 
computationally simplest level, global equilibrium, is concerned with macroscopic situa- 
tions characterized by finitely many space- and time-independent variables. The next level, 
local equilibrium, treats macroscopic situations in a continuum mechanical description, 
where the equilibrium subsystems are labeled by the space coordinates. Therefore the rel- 
evant variables are finitely many space- and time-dependent fields. The next deeper level, 
microlocal^ equilibrium, treats mesoscopic situations in a kinetic description, where the 
equilibrium subsystems are labeled by phase space coordinates. The relevant variables are 
now finitely many fields depending on time, position, and momentum; cf. Balian [15]. The 
bottom level is the microscopic regime, where we must consider quantum equilibrium. 
This no longer fits a thermodynamic framework but must be described in terms of quantum 
dynamical semigroups; see Section 7.2. 

^The term microlocal for a phase space dependent analysis is taken from the literature on partial 
differential equations; see, e.g., Martinez [1]. 
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The relations between the different description levels are discussed in Section 7.2. Apart 
from descriptions on these clear-cut levels, there are also various hybrid descriptions, where 
some part of a system is described on a more detailed level than the remaining parts, or 
where, as for stirred chemical reactions, the fields are considered to be spatially homoge- 
neous and only the time-dependence matters. 

In global equilibrium, all thermal variables are constant throughout the system, except at 
phase boundaries, where the extensive variables may exhibit jumps and only the intensive 
variables remain constant. This is sometimes referred to as the zeroth law of thermody- 
namics and characterizes global equilibrium; it allows one to measure intensive variables 
(like temperature) by bringing a calibrated instrument that is sensitive to this variable (for 
temperature a thermometer) into equilibrium with the system to be measured. For local 
or microlocal equilibrium, the same intuition applies, but with fields in place of variables. 
Then extensive variables are densities represented by distributions that can be meaning- 
fully integrated over bounded regions, whereas intensive variables are nonsingular fields 
(e.g., pressure) whose integrals are physically irrelevant. 
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Chapter 5 



Quantities, states, and statistics 



When considered in sufficient detail, no physical system is truly in global equilibrium; one 
can always find smaller or larger deviations. To describe these deviations, extra variables 
are needed, resulting in a more complete but also more complex model. At even higher 
resolution, this model is again imperfect and an approximation to an even more complex, 
better model. This refinement process may be repeated in several stages. At the most 
detailed stages, we transcend the frontier of current knowledge in physics, but even as this 
frontier recedes, deeper and deeper stages with unknown details are imaginable. 

Therefore, it is desirable to have a meta-description of thermodynamics that, starting with 
a detailed model, allows to deduce the properties of each coarser model, in a way that all 
description levels are consistent with the current state of the art in physics. Moreover, the 
results should be as independent as possible of unknown details at the lower levels. This 
meta-description is the subject of statistical mechanics. 

This chapter introduces the technical machinery of statistical mechanics, Gibbs states and 
the partition function, in a uniform way common to classical mechanics and quantum 
mechanics. As in the phenomenological case, the intensive variables determine the state 
(which now is a more abstract object), whereas the extensive variables now appear as 
values of other abstract objects called quantities. This change of setting allows the natural 
incorporation of quantum mechanics, where quantities need not commute, while values are 
numbers observable in principle, hence must satisfy the commutative law. 

The operational meaning of the abstract concepts of quantities, states and values introduced 
in the following becomes apparent once we have recovered the phenomenological results of 
Chapter 4 from the abstract theory developped in this and the next chapter. Chapter 7 
discusses in more detail how the theory relates to experiment. 
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5.1 Quantities 

Any fundamental description of physical systems must give account of the numerical values 
of quantities observable in experiments when the system under consideration is in a specified 
state. Moreover, the form and meaning of states, and of what is observable in principle, 
must be clearly defined. We consider an axiomatic conceptual foundation on the basis of 
quantities^ and their values, consistent with the conventions adopted by the International 
System of Units (SI) [238], who declare: "A quantity in the general sense is a property 
ascribed to phenomena, bodies, or substances that can be quantifi.ed for, or assigned to, a 
particular phenomenon, body, or substance. [...] The value of a physical quantity is the 
quantitative expression of a particular physical quantity as the product of a number and a 
unit, the number being its numerical value." 

In different states, the quantities of a given system may have different values; the state 
(equivalently, the values determined by it) characterizes an individual system at a particular 
time. Theory must therefore define what to consider as quantities, what as states, and how 
a state assigns values to a quantity. Since quantities can be added, multiplied, compared, 
and integrated, the set of all quantities has an elaborate structure whose properties we 
formulate after the discussion of the following motivating example. 

5.1.1 Example. As a simple example satisfying the axioms to be introduced, the reader 
may think of an A'"-level quantum system. The quantities are the elements of the alge- 
bra E = C^^^ of square complex N x N matrices, the constants are the multiples of 
the identity matrix, the conjugate /* of / is given by conjugate transposition, and the 
integral Jg = tig is the trace, the sum of the diagonal entries or, equivalently, the sum 
of the eigenvalues. The standard basis consisting of the N unit vectors e'^ with a one in 
component k and zeros in all other component corresponds to the N levels of the quantum 
systems. The Hamiltonian H is represented by a diagonal matrix H = Diag(£^i, . . . , En) 
whose diagonal entries E^ are the energy levels of the system. In the nondegenerate case, 
all Ek are distinct, and the diagonal matrices comprise all functions of H. Quantities rep- 
resenting arbitrary nondiagonal matrices are less easy to interpret. However, an important 
class of quantities are the matrices of the form P — ipip*, where is a vector of norm 1; 
they satisfy P'^ = P = P* and are the quantities observed in binary measurements such 
as detector clicks; see Section 7.5. The states of the A^-level system are represented by a 
density matrix p G E, a positive semidefinite Hermitian matrix with trace one. The value 
of a quantity / e E is the number (/) = trp/. The diagonal entries pk := Pkk represent 
the probability for obtaining a response in a binary test for the kth quantum level; the 
off-diagonal entries pjk represent deviations from a classical mixture of quantum levels. 

5.1.2 Definition. 

(i) A *-algebra is a set E together with operations on E defining for any two quantities 

^We deliberately avoid the notion of observables, since it is not clear on a fundamental level what it 
means to 'observe' something, and since many things (such as the finne structure constant, neutrino masses, 
decay rates, scattering cross sections) which can be observed in nature are only indirectly related to what 
is traditionally called an 'observable' in quantum mechanics. The related problem of how to interpret 
measurements is discussed in Section 7.4. 
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/, 5f e E the sum / + (yf G E, the product fg e E, and the conjugate /* e E, such that 
the following axioms (Q1)-(Q4) hold for all a e C and all f,g,he E: 

(Ql) C C E, i.e., complex numbers are special elements called constants, for which 
addition, multiplication and conjugation have their traditional meaning. 

(Q2) {fg)h = f{gh), af = fa, 0/ = 0, 1/ = /. 

(Q3) {f + 9)+h = f+{g + h), f{g + h) = fg + fh, / + = /. 

(Q4) = {fgr^g*r, {f + gy^f* + g*. 

(ii) A *-algebra E is called commutative if fg — gf for all /, G E, and noncommutative 
otherwise. The *-algebra E is called nondegenerate if 

(Q5) f*f = Q ^ f = Q. 

(iii) We introduce the notation 

-/:=(-!)/, f-g:=f + (-g), [f,g]:=fg-gf, 
f:=l, f:=f-'f (/ = 1,2,...), 
Re/:=^(/ + r), Im/ -!(/-/*), 

for f,g^K. [f,g] is called the commutator of / and g, and Re/, Im/ are referred to 
as the real p£U"t (or Hermitian pcU"t) and imagiucU-y part of /, respectively. / G E is 
called Hermitian \i f* — f .. 

(iv) A *-homomorphism is a mapping from a *-algebra E with unity to another (or the 
same) *-algebra E' with unity such that 

4>{f + 9)=m + m. 4>{f9) = 4>{m9). 'P{af) = acl>{f), 

</>(/*)= 0(/r, 0(1) = 1- 

for all /, in E and a G C. 

Note that we assume commutativity only for the product of numbers and elements of E. 
In general, the product of two elements of E is indeed noncommutative. However, general 
commutativity of the addition is a consequence of our other assumptions. We prove this 
together with some other useful relations. 

5.1.3 Proposition. 

(i) For all f,g,he E, 

{f + g)h^fh + gh, /-/ = 0, f + g^g + f (5.1) 

[/,/*] = -2^[Re/,Im/]. (5.2) 

(ii) For all / G E, Re/ and Im / are Hermitian. f is Hermitian iff / = Re / iff Im / = 0. 
If /, g are commuting Hermitian quantities then fg is Hermitian, too. 
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Proof, (i) The right distributive law follows from 

{f+9)h = {{f+9)hy* = {h*{f+gyr = mr+9*)r 

= {h* f* + h* g*y = {h* f*y + {h* g*)* 
= f**h** + g**h** = fh + gh. 

It imphes f — f — If — If — {1 — l)f — Of — 0. Prom this, we may deduce that addition 
is commutative, as follows. The quantity h :— —f + g satisfies 

-h = + g) = (-l)(-l)/ + (-1)^ = f-g, 

and we have 

f + 9 = f + {h-h) + g={f + h) + {-h + g) = {f-f + g) + {f-g + g)=g + f. 

This proves (5.1). liu — Ref, v — Imf then u* — u,v* — v and f — u + iv, f* — u — iv. 
Hence 

[/, /*] = {u + iv){u — iv) — {u — iv){u + iv) = 2i{yu — uv) = — 2i[Re /, Im/], 
giving (5.2). 

(ii) The first two assertions are trivial, and the third holds since {fgy = g*f* = gf = fg 
if /, g are Hermitian and commute. □ 

5.1.4 Definition. 

(i) The *-algebra E is called partially ordered if there is a partial order > satisfying the 
following axioms (Q6)-(Q9) for all f,g,he E: 

(Q6) > is reflexive (/ > /), antisymmetric {f>g>f^f~g), and transitive 
{f>9>h=^f>h)). 

(Q7) f>g f + h>g + h. 

(Q8) / >0 f = f*e.ndg*fg>0. 

(Q9) 1 > 0. 

We introduce the notation 

f<9-^9>f: 

11/11 := inf{aGM|/7<a^«>0}, 

where the infimum of the empty set is taken to be oo. The number ||/|| is referred to as the 
(spectral) norm of /. An element / e E is called bounded if ||/|| < oo. The uniform 
topology is the topology induced on E by declaring as open sets arbitrary unions of finte 
intersections of the open balls {/ e E | ||/ — /o|| < s} for some £ > and some /o G E. 
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5.1.5 Proposition. 

(i) For all quantities f , g, h & E and A e C, 

/7>0, /r>0. (5.3) 

/7<0 => 11/11=0 => / = 0, (5.4) 

f<9 h*fh < h*gh, \X\f < \X\g, (5.5) 

r^ + //<2||/|| \\g\\, (5.6) 

l|A/|| = |A|||/||, ||/±^?||< 11/11 ±|bll, (5.7) 

11/^11 < 11/11 ll^ll- (5.8) 
(a) Among the complex numbers, precisely the nonnegative real numbers A satisfy A > 0. 

Proof, (i) (5.3)-(5.5) follow directly from (Q7) - (Q9). Now let a = ||/||, f] = \\g\\. Then 
/*/ < CK^ and g*g < (3^. Since 

0<{Pf-agy{Pf-ag) = f* f - aP{rg + g* f) + a'g*g 

< P'a'±aP{rg + g*f) + a'g*g, 

f*9 + 9*f < 2q;/3 if aP 7^ 0, and for aP = 0, the same follows from (5.4). Therefore (5.6) 
holds. The first half of (5.7) is trivial, and the second half follows for the plus sign from 

(/ + gYif + 9) = n + 1*9 + 9*f + 9*9 < + 2«/? + P^^{a + P)\ 
and then for the minus sign from the first half. Finally, by (5.5), 

{f9rU9) = 9*rf9 < 9*0^9 = a'g*g < a^p\ 

This implies (5.8). 

(ii) If A is a nonnegative real number then A = /*/ > with / = \/\. If A is a negative 
real number then A = — /*/ < with / = \/— A, and by antisymmetry, A > is impossible. 
If A is a nonreal number then A 7^ A* and A > is impossible by (Q8). □ 



5.1.6 Definition. A Euclidean *-aIgebra is a nondegenerate, partially ordered *-algebra 
E, whose elements are called quantities, together with a complex-valued integral / de- 
fined on a subspacc S of E, whose elements are called strongly integrable, satisfying the 
following axioms (EAl)-(EA6): 

(EAl) bounded, /i strongly integrable =^ /i*, g'/i, /ig' strongly integrable, 

(EA2) Jh*h>Q iih^O, 

(EA3) iJhy^Jh*, Jgh^Jhg, 



(EA4) = for all strongly integrable /i =^ g — (nondegeneracy ) , 
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(EA5) jh\hi^Q => Jghi^O, Jh^ghi^O, 
(EA6) hi 10 =^ mijhi = (Dini property). 

Here, integrals extend over the longest following product or quotient (while later, differential 
operators act on the shortest syntactically meaningful term), the monotonic limit is 
defined hy gi I iff, for every strongly integrable h, the sequence (or net) Jh*gih consists 
of real numbers converging monotonically decreasing to zero. 

Note that the integral can often be naturally extended from strongly integrable quantities 
to a significantly larger space of integrable quantities. 

5.1.7 Proposition. 

^ e E, fgf = foraUfeE g^ 0. (5.9) 

For strongly integrable /, g, 

J{ghy{gh) < Jg*g Jh*h. (Cauchy-Schwarz inequality) (5.10) 
In particular, every strongly integrable quantity is bounded. 

Proof. 

If Jgf = for all / e E then this holds in particular for f = hh*. Thus = / ghh* = Jh*gh 
by (EA2), and (EA4) gives the desired conclusion (5.9). (5.10) holds since by (EA2), Jg*h 
defines a positive definite inner product on S, and directly implies the final statement. □ 

We now describe the basic Euchdean *-algebras relevant in nonrelativistic physics. However, 
the remainder is completely independent of details how the axioms are realized; a specific 
realization is needed only when doing specific quantitative calculations. 

5.1.8 Examples. 

(i) (A^-level quantum systems) The simplest family of Euclidean *-algebras is the algebra 
E = C^^^ of square complex N x N matrices; cf. Example 5.1.1. Here the quantites are 
square matrices, the constants are the multiples of the identity matrix, the conjugate is 
conjugate transposition, and the integral is the trace, the sum of the diagonal entries or, 
equivalently, the sum of the eigenvalues. In particular, all quantities are strongly integrable. 

(ii) (Nonrelativistic classical mechanics) An atomic iV-particle system is described in 
classical mechanics by the phase space with six coordinates - position e and 
momentum p" G R^ - for each particle. The algebra 

En := C°°(R^^) 

of smooth complex- valued functions g{x^'^ ,p^'^) of positions and momenta is a commuta- 
tive Euclidean *-algebra with complex conjugation as conjugate and the Liouville integral 

^ = C-^|rfp^V^^(x^^^,p^^^), 
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where C is a positive constant. Strongly integrable quantities are the Schwartz functions 
in E. The axioms are easily verified. 

(iii) (Classical fluids) A fluid is classically described by an atomic system with an in- 
deflnite number of particles. The appropriate Euclidean *-algebra for a single species of 
monatomic particles is the direct sum E = ©Ar>oEAr whose quantities are infinite sequences 
g = {gQ,gi, ...) of g^ G Ejv, with Ejv as in (i), and weighted Liouville integral 



Here Cat is a symmetry factor for the symmetry group of the iV-particle systen, which equals 
h^^Nl for indistinguishable particles; h — 27rh is Planck's constant. This accounts for the 
Maxwell statistics and gives the correct entropy of mixing. Classical fluids with monatomic 
particles of several different kinds require a tensor product of several such algebras, and 
classical fluids composed of molecules require additional degrees of freedom to account for 
the rotation and vibration of the molecules. 

(iv) (Nonrelativistic quantum mechanics) Let EI be a Euclidean space, a dense sub- 
space of a Hilbert space. Then the algebra E := Lin EI of continuous linear operators on EI 
is a Euclidean *-algebra with the adjoint as conjugate and the quantum integral 



given by the trace of the quantity in the integrand. Strongly integrable quantities are the 
operators g' G E which are trace class; this includes all linear operators of flnite rank. Again, 
the axioms are easily verifled. In the quantum context, Hermitian quantities / are often 
referred to as observables; but we do not use this term here. 

We end this section by stating some results needed later. The exposition in this and the 
next chapter is fully rigorous if the statements of Proposition 5.1.9 and Proposition 5.L10 
are assumed in addition to (EA1)-(EA6). We prove these propositions only in case that E 
is flnite-dimensionaP. But they can also be proved if the quantities involved are smooth 
functions, or if they have a spectral resolution; cf., e.g., Thirring [240] (who works in the 
framework of C*-algebras and von Neumann algebras) . 

5.1.9 Proposition. For arbitrary quantities f, g, 



^ We'd appreciate to be informeci about possible proofs in general that only use the properties of Euclidean 
*-algebras (and perhaps further, elementary assumptions). 




/5 = trg, 
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f>0 y/f>0, iVff^f, 

For any quantity f — f{s) depending continuously on s & [a, b], 

and for any quantity f — /(A) depending continuously dilFerentiably on a parameter vector 

^Jf = Jdf/d\. 

Proof. In finite dimensions, the first four assertions arc standard matrix calculus, and 
the remaining two statements hold since Jf must be a finite hnear combination of the 
components of /. □ 



5.1.10 Proposition. Let f.g be quantities depending continuously differentiably on a 
parameter or parameter vector A, and suppose that 

[f{X),g{X)]^OforallX. 

Thus, for any continuously differentiable function F of two variables, 

T,5nM = Sa.FU,,)l^Sd.F(M%. (5.ii) 

Here d^F and 82 f denote differentiation by the first and second argument of F, respectively 
Proof. We prove the special case F{x,y) = x'^y"', where (5.11) reduces to 

The general case then follows for polynomials F{x, y) by taking suitable linear combinations, 
and for arbitrary F by a limiting procedure. To prove (5.12), we note that, more generally, 

^//l ■ ■ ■ fm+n — ^ 'dX^f^ ■ ■ ■ fm+n) 

m+n 

^ / /l • • • ■ ■ ■ •^'"+" 

m+n ^j, 

m+n ^j, 
= ^ • • • fm+nfl ■ ■ ■ ' 

i=i 

using the cyclic commutativity (EA3) of the integral. If we specialize to fj = / if j < m, 
fj = g if j > m, and note that / and g commute, we arrive at (5.12). □ 



Of course, the proposition generalizes to families of more than two commuting quantities; 
but more important is the special case g = f: 
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5.1.11 Corollary. For any quantity f depending continuously differentiably on a param- 
eter vector X, and any continuously differentiahle function F of a single variable, 

rxMf) = SFy)f^. (5.13) 



5.2 Gibbs states 



Our next task is to specify the formal properties of the value of a quantity 

5.2.1 Definition. A state is a mapping ~ that assigns to all quantities / from a subspace 
of E containing all bounded quantities its value / =: (/) e C such that for all /, e E, 

aeC, 

(El) (1) = 1, (/*) = (/)*, {f + 9) = {f) + {9), 

(E2) {af) = a{f), 

(E3) If / > then (/) > 0, 

(E4) If fieE, fiiO then (/;) i 0. 



Note that this formal definition of a state - always used in the remainder of the book 
- differs from the phenomenological thermodynamic states defined in Section 4.1. The 
connection between the two notions will be made in Section 6.2. 

Statistical mechanics essentially originated with Josiah Willard Gibbs, whose 1902 book 
Gibbs [90] on (at that time of course only classical) statistical mechanics is still readable. 
See Uffink [246] for a history of the subject. 

All states arising in thermodynamics have the following particular form. 

5.2.2 Definition. A Gibbs state is defined by assigning to any g the value 

(g) Je-'/'g, (5.14) 

where S, called the entropy of the state, is a Hermitian quantity with strongly integrable 
satisfying the normalization condition 

Je-'/' = 1, (5.15) 

and k is the Boltzmann constant 

k ^ 1.38065 • IQ-^^ J/K. (5.16) 
Theorem 5.2.3 below imphes that a Gibbs state is indeed a state. 
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The Boltzmann constant defines the units in which the entropy is measured. In analogy^ 
with Planck's constant h, we write in place of the customary k or ks, in order to be free 
to use the letter k for other purposes. By a change of units one can enforce any value of k. 
Chemists use instead of particle number A'^ the corresponding mole number, which differs 
by a fixed numerical factor, the Avogadro constant 

Na = R/k ^ 6.02214 • lO^^mol, 

where R is the universal gas constant (4.8). As a result, all results from statistical mechanics 
may be translated to phenomenological thermodynamics by setting k = R, corresponding 
to setting 1 mol = 6.02214 • 10^^, the number of particles in one mole of a pure substance. 

What is here called entropy has a variety of alternative names in the literature on statistical 
mechanics. For example, GiBBS [90], who first noticed the rich thermodynamic imphcations 
of states defined by (5.14), called —S the index of probability; Alhassid & Levine [5] and 
Balian [15] use the name surprisal for S. Our terminology is close to that of Mrugala 
et al. [176], who call S the microscopic entropy, and HASSAN et al. [109], who call S the 
information(al) entropy operator. What is traditionally (and in Section 4.1) called entropy 
and denoted by S is in the present setting the value S = (S) . 

5.2.3 Theorem. 

(i) A Gibbs state determines its entropy uniquely. 

(a) For any Hermitian quantity f with strongly integrable , the mapping (•) / defined by 

{g)f:=Zj'Je-fg, where Zf := Je-^ , (5.17) 
is a state. It is a Gibbs state with entropy 

Sf:=k{f + logZf). (5.18) 
(Hi) The KMS condition (cf. KuBO [143], Martin & Schwinger [166]) 

{gh)f = (hQfg) for bounded g, h (5.19) 
holds. Here Qf is the linear mapping defined by 

Qf9 ■= e'^ge^ . 

Proof, (i) If the entropies S and S' define the same Gibbs state then 

/(e-^/^ - e-'''^)g = {g) - (g) ^ 

for all g, hence (5.9) gives e^^^^ — e~^'^^ = 0. This implies that e~^^^ = e~^'^^, hence 
S = S' hj Proposition 5.1.9. 

■^As we shall see in (14.20) and (6.42), h and k play indeed analogous roles in quantum mechanical and 
thermodynamic uncertainty relations. 
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(ii) The quantity d := e •^/^ is nonzero and satisfies d* = d, e f = d*d > 0. Hence Zf>Q 
by (EA2), and p := Zj^e~^ is Hermitian and nonnegative. For /i > 0, the quantity g = y/f 
is Hermitian (by Proposition 5.1.9) and satisfies gpg* = Zj^{gd){gd)* > 0, hence by (EA3), 

(/')/ = (9*9) f = !p9*9 = Jgpg* > 0. 

Moreover, (l)j = ZJ^J — 1. Similarly, if gf > then g — h*H with h — ^/g — h* and 
with k :— e~-^/^/i, we get 

Zf{g)f = Je-fhh* = Jh*e-fh = Jk*k > 0. 

This implies (E3). the other axioms (E1)-(E4) follow easily from the corresponding prop- 
erties of the integral. Thus (•)/ is a state. Finally, with the definition (5.18), we have 

whence (•)/ is a Gibbs state. 

(in) By (EA3), {hQfg)f = je-fhQjg = JQjge-fh = Je-fgh = {gh)f. □ 

Note that the state (5.17) is unaltered when / is shifted by a constant. Qf is called the 
modular automorphism of the state (•)/ since Qf{gh) — Qf{g)Qf{h); for a classical 
system, Q/ is the identity. In the following, we shall not make use of the KMS condition; 
however, it plays an important role in the mathematics of the thermodynamic limit (cf. 
Thirring [240]). 

Zf is called the partition function of /; it is a function of whatever parameters appear 
in a particular form given to / in the applications. A large part of traditional statistical 
mechanics is concerned with the calculation, for given /, of the partition function Zf and 
of the values (^f) / for selected quantities g. As we shall see, the basic results of statistical 
mechanics are completely independent of the details involved, and it is this basic part that 
we concentrate upon in this book. 

5.2.4 Example. A canonical ensemble"^, is defined as a Gibbs state whose entropy is 
an affine function of a Hermitian quantity H, called the Hamiltonian: 

S ^ (3H + const, 

with a constant depending on /3, computable from (5.18) and the partition function 

Z = Je-^^ 

^ Except in the traditional notions of a microcanonical, canonical, or grand canonical ensemble, we 
avoid the term ensemble which in statistical mechanics is de facto uses as a synonym for state but 
often has the connotation of a large real or imagined collection of identical copies of a systems. The 
latter interpretation has well-known difficulties to explain why each single macroscopic system is described 
correctly by thermodynamics; see, e.g., Sklar [228]. 
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of / = f3H. In particular, in the quantum case, where J is the trace, the finiteness of Z 
imphes that S and hence H must have a discrete spectrum that is bounded below. Hence 
the partition function takes the familiar form 

Z = tre-^^ = J]e-^^", (5.20) 

where the En {n e jV) are the energy levels, the eigenvalues of if. If the spectrum of H 
is known, this leads to explicit formulas for Z. For example, a two level system is defined 
by the energy levels 0, E (or Eq and Eq + E, which gives the same results), and has 

Z=l + e"^^. (5.21) 

It describes a single Fermion mode, but also many other systems at low temperature; cf. 
(6.56). In particular, it is the basis of laser-induced chemical reactions in photochemistry 
(see, e.g., Karlov [129], MuROV et al. [178]), where only two electronic energy levels (the 
ground state and the first excited state) are relevant; cf. the discussion of (6.56) below. 

For a hcirmonic oscillator, defined by the energy levels nE, n = 0, 1, 2, . . . and describing 
a single Boson mode, we have 

oo 

Z = ^e-"^^ = (l-e-^^)-\ 

n=0 

Independent modes are modelled by taking tensor products of single mode algebras and 
adding their Hamiltonians, leading to spectra which are obtained by summing the eigen- 
values of the modes in all possible ways. The resulting partition function is the product 
of the single-mode partition functions. ] From here, a thermodynamic limit leads to the 
properties of ideal gases. Then nonideal gases due to interactions can be handled using the 
cumulant expansion, as indicated at the end of Section 5.3. The details are outside the 
scope of this book. 



Since the Hamiltonian can be any Hermitian quantity, the quantum partition function 
formula (5.20) can in principle be used to compute the partition function of arbitrary 
quantized Hermitian quantities. 



5.3 Kubo product and generating functional 

The negative logarithm of the partition function, the so-called generating functional, plays 
a fundamental role in statistical mechanics. 

We first discuss a number of general properties, discovered by GiBBS [90], Peierls [190], 
BoGOLiUBOV [33], Kubo [144], Mori [174], and Griffiths [100]. The somewhat technical 
setting involving the Kubo inner product is necessary to handle noncommuting quantities 
correctly; everything would be much easier in the classical case. On a first reading, the 
proofs in this section may be skipped. 
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5.3.1 Proposition. Let f be Hermitian such that e^f is strongly integrable for all s e 
[-1,1]. Then 

{g;h)f:={gEfh)f, (5.22) 
where Ef is the linear mapping defined for Hermitian f by 



Efh:= [ dse-'^he'f, 
Jo 



defines a bilinear, positive definite inner product (•;■)/ on the algebra of quantities, called 
the Kubo (or Mori or Bogoliubovj inner product. For all /, g, the following relations 
hold: 

{g;h)}^{h*;g*)f. (5.23) 
{g*;g)f>0 ifg^O. (5.24) 
{g;h)f^g{h)f if g e C, (5.25) 
{g; h)f = (gh) / if g or h commutes with f, (5.26) 
Efg = g if g commutes with /, (5-27) 
If f = /(A) depends continuously differentiably on the real parameter vector X then 

d 



Proof, (i) We have 

(g-h)} = {{gEfhr)f = {{Efhrg*)f = ( e^^^e^V)^ = (is(e^^/.*e-V)/- 

The integrand equals 

Je-fe'^h*e-'^g* = Je'^ h*e-'^ g* = Je-^ h*e-'^ g*e'^ = {h*e-'^ g*e'^) f 
by (EA3), hence 

{g;h)} = j\s{h*e-^fg*e^f)f = (hT j\se-^f g^e^f) ^ = = {h*-.g*)f. 

Thus (5.23) holds. 

(ii) Suppose that g ^Q. For s e [0, 1], we define u = s/2,v = (l — s)/2 and g{s) :— e''^^ge^^. 
Since / is Hermitian, g{s)* = e"^ g*e-''^ , hence by (EA3) and (EA2), 

Je-fg*e-'fge'f = Je"^ ge'^^f g^e"^^ = /^(s)*<7(s) > 0, 

so that 

{g*;g)f = {g*Efg)f = f ds Je-^ g*e-^^ ge^^ > 0. 

Jo 

This proves (5.24), and shows that the Kubo inner product is positive definite. 
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(iii) If / and g commute then ge^^ — e^^g, hence 

Efg^ [ dse-'U'^g^ [ dsg ^ g, 
Jo Jo 

giving (5.27). The definition of the Kubo inner product then imphes (5.26), and taking 
g E C gives (5.25). 

(iv) The function q on [0, 1] defined by 
satisfies q{Q) = and 

e-./|e"+(Ae-./)/e'/ + A(_e-.//)e./.0. 

Hence q vanishes identicaUy. In particular, q{l) = 0, giving (5.28). □ 



As customary in thermodynamics, we use differentials to express relations involving the 
differentiation by arbitrary parameters. To write (5.28) in differential form, we formally 
multiply by dX, and obtain the quantum chain rule for exponentials, 

de-f = {-Efdf)e-^. (5.29) 

If the /(A) commute for all values of A then the quantum chain rule reduces to the classical 
chain rule. Indeed, then / commutes also with ^; hence Ef^ — ^, and Efdf — df. 



The following theorem is central to the mathematics of statistical mechanics. As will be 
apparent from the discussion in the next chapter, part (i) is the abstract mathematical 
form of the second law of thermodynamics, part (ii) allows the actual computation of 
thermal properties from microscopic assumptions, and part (iii) is the abstract form of the 
first law. 

5.3.2 Theorem. Let f be Hermitian such that e^^ is strongly integrable for all s e [—1, 1]. 

(i) The generating functional 

Wif) :=- log fe-f (5.30) 
is a concave function of the Hermitian quantity f. In particular, 

W{g) < W{f) + {g- f) f. (Gibbs-Bogoliubov inequality) (5.31) 

Equality holds in (5.31) iff f and g differ by a constant. 

(ii) For Hermitian g, we have 



W{f + rg) = W{f) - log(e-^-^V)/. 



(5.32) 
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Moreover, the cumulant expansion 

W{f + rg) = W{f) + T{g)f + ^^{{g)) - {g; g) f) + 0{t') (5.33) 
holds if the coefEcients are Unite. 

(Hi) If f = /(A) and g = g{\) depend continuously differentiably on A then the following 
differentiation formulas hold: 

d{g)f = {dg)f - {g; df)f + {g)f{df)f, (5.34) 

dW{f) = {df)f. (5.35) 

(iv) The entropy of the state (•)/ is 

S^n(f-W(f)). (5.36) 
Proof. We prove the assertions in reverse order. 

(iv) Equation (5.30) says tliat W{f) = — logZf, which together with (5.18) gives (5.36). 
(iii) We have 

djge~^ — Jdge^-f + jgde^^ = Jdge^^ -- fgEfdfe~^ 
= J{dg - gEfdf)e-f = Zf{dg - gEfdf)f. 

On the other hand, d J ge~^ — d{Zf{g) f) — dZf{g) f + Zfd{g) /, so that 

dZf{g)f + Zfd{g)f = Zf{dg - gEfdf)f = Zf{dg)f - Zf{g; df)f. (5.37) 

In particular, for g = 1 we find by (5.25) that dZj = —Zf{l; df)f = —Zf{df)f. Now (5.35) 
follows from dW{f) = —dlogZf = —dZf/Zf = {df)f, and solving (5.37) for d{g)f gives 
(5.34). 

(ii) Equation (5.32) follows from 

by taking logarithms and setting h — f+rg. To prove the cumulant expansion, we introduce 
the function 4> defined by 

(/)(r) := W{f + Tg), 

From (5.35), we find 4>'{t) = {g)f+T-g for f,g independent of r, and by differentiating this 
again. 

In particular, 

0'(O) = {9)f, 0"(O) = {g)} - {gEfg)f = {g)} - {g; g)f. (5.38) 
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A Taylor expansion now implies (5.33). 

(i) Since the Cauchy-Schwarz equation for the Kubo inner product implies 

{g)} ^ {g;l)} < {g;g)f{l;l)f ^ {g;g)f, 

(5.38) imphes that 



'2 



< 

r=0 



for all f,g. This implies that W{f) is concave. Moreover, replacing / by / + sg, we find 
that (j)"{s) < for all s. The remainder form of Taylor's theorem therefore gives 



0(r) = 0(0) + r0'(O) + / ds{T-s)(P"{s) <<l){0)+T(j)'{0), 

Jo 

and for T = 1 we get 

W{f + g)<W{f) + {g)f. (5.39) 
(5.31) follows for T = 1 upon replacing g hy g — f. 

By the derivation, equality holds in (5.39) only if 0"(s) = for all < s < 1. By (5.38), 
applied with f + sg in place of /, this implies {g)'^fj^gg = {g; g)f+sg- Thus we have equality 
in the Cauchy-Schwarz argument, forcing g to he a multiple of 1. Therefore equality in the 
Gibbs-Bogoliubov inequality (5.31) is possible only if — / is a constant. □ 

As a consequence of the Gibbs-Bogoliubov inequality, we derive an important inequality 
for the entropy. 

5.3.3 Theorem. Let Sc be the entropy of a reference state. Then, for an arbitrary Gibbs 
state (•) with entropy S, 

{S) < {Sc), (5.40) 

with equahty only if Sc — S. 

Proof. Let / = S/k and g = Sc/Ti. Since S and Sc are entropies, W{f) = W{g) = 0, 
and the Gibbs-Bogoliubov inequality (5.31) gives < {g — f)f = {Sc — S)/k. This implies 
(5.40). If equality holds then equality holds in (5.31), so that Sc and S differ only by a 
constant. But this constant vanishes since the values agree. □ 



The difference 

{Sc -S)^ {Sc) -{S)>0 (5.41) 

is known as relative entropy. In an information theoretical context (cf. Section 7.6), 
the relative entropy may be interpreted as the amount of information in a state (•) which 
cannot be explained by the reference state. This interpretation makes sense since the 
relative entropy vanishes precisely for the reference state. A large relative entropy therefore 
indicates that the state contains some important information not present in the reference 
state. 
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Approximations. The cumulant expansion is the basis of a well-known approximation 
method in statistical mechanics. Starting from special reference states {■)f with explicitly 
known W{f) and Ef (corresponding to so-called explicitly solvable models), one obtains 
inductively expressions for values in these states by applying the differentiation rules. (In 
the most important cases, the resulting formulas for the values are commonly referred to as 
a Wick theorem, cf. WiCK [260], although the formulas arc much older and were derived 
in 1918 by ISSERLIS [118]. For details, see textbooks on statistical mechanics, e.g., HuANG 
[115], Reichl [205].) 

From these, one can calculate the coefficents in the cumulant expansion; note that higher 
order terms can be found by proceeding as in the proof, using further differentiation. This 
gives approximate generating functions (and by differentiation associated values) for Gibbs 
states with an entropy close to the exphcitly solvable reference state. From the resulting 
generating function and the differentiation formulas (5.34)-(5.35), one gets as before the 
values for the given state. 

The best tractable reference state (•)/ to be used for a given Gibbs state {■)g can be obtained 
by minimizing the upper bound in the Gibbs-Bogoliubov inequality (5.31) over all / for 
which an explicit generating function is known. Frequently, one simply approximates W{g) 
by the minimum of this upper bound, 

W{g) ^ Wm{g) - inf (W{f) + {g- f)f) . (5.42) 

Using Wjn{g) in place of W{g) defines a so-called mean field theory; cf. C ALLEN [49]. For 
computations from first principles (quantum field theory), see, e.g., the survey by Berges 
et al. [119]. 

5.4 Limit resolution and uncertainty 

Definition 5.2.1 generalizes the expectation axioms of WHITTLE [259, Section 2.2] for clas- 
sical probability theory. Indeed, the values of our quantities are traditionally called ex- 
pectation values, and refer to the mean over an ensemble of (real or imagined) identically 
prepared systems. 

In our treatment, we keep the notation with pointed brackets familiar from statistical 
mechanics, but use the more neutral term value for (/) to avoid any reference to probability 
or statistics. This keeps the formal machinery completely independent of controversial 
issues about the interpretation of probabilities. Statistics and measurements, where the 
probabilistic aspect enters directly, are discussed separately in Chapter 7.2. 

The key to an interpretation of the values of quantities as objective, observer-independent 
properties is an analysis of the uncertainty inherent in the description of a system by a 
state, based on the following result. 



5.4.1 Proposition. For Hermitian g, 

{9? < (/)• 



(5.43) 
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Equality holds if g — (g). 

Proof. Put g={g). Then_0 < {{g-g)^) = {g^-2gg + t) = {g^)-2g{g)+f = (^')-(^?)'- 
This gives (5.43). U g = g then equahty holds in this argument. □ 

5.4.2 Definition. The number 

cov(/,5) :=Re{{f-7r{g-g)) 

is called the covariance of f,g e E. Two quantities /, g are called uncorrelated if 
COY {f,g) — 0, and correlated otherwise. The number 

^(/) := Vcov(/, /) 
is called the uncertainty of / e E in the state (•). The number 

Tes{g) := V{9')/{9)' ' 1, (5-44) 
is called the limit resolution of a Hermitian quantity g with nonzero value {g). 

Note that (E3) and (5.43) ensure that cr(/) and res{g) are nonncgative real numbers that 
vanish if /, g are constant, i.e., complex numbers, and (7 7^ 0. This definition is analogous to 
the definitions of elementary classical statistics, where E is a commutative algebra of random 
variables, to the present, more general situation; in a statistical context, the uncertainty 
(t(/) is referred to as standard deviation. 

There is no need to associate an intrinsic statistical meaning to the above concepts. We 
treat the uncertainty (t(/) and the limit resolution res(g') simply as an absolute and relative 
uncertainty measure, respectively, specifying how accurately one can treat as a sharp 
number, given by this value. 

In experimental practice, the limit resolution is a lower bound on the relative accuracy 
with which one can expect {g) to be determinable reliably^ from measurements of a single 
system at a single time. In particular, a quantity g is considered to be significant if 
res((7) <^ 1, while it is noise if Tes{g) 3> 1. If (7 is a quantity and 5? is a good approximation 
of its value then A^f := g — ^ is noise. Sufficiently significant quantities can be treated as 
deterministic; the analysis of noise is the subject of statistics. 

5.4.3 Proposition. For any state, 

(i)f<9 {f)<{9). 

^The situation is analogous to the limit resolution with which one can determine the longitude and 
latitude of a city such as Vienna. Clearly these are well-defined only up to some limit resolution related to 
the diameter of the city. No amount of measurements can reduce the uncertainty below about 10km. For 
an extended object, the uncertainty in its position is conceptual, not just a lack of knowledge or precision. 
Indeed, a point may be defined to be an object in a state where the position has zero limit resolution. 
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(ii)Forf,geE, 

coYif,g) = Rc{{rg)-{fr{g)), 

(/7) = (/r(/)+<^(/r, 

\{f)\<V{Fr). 

(hi) If f is Hermitian then f — (/) is real and 

^(/) = \/{{f-7r) = viP) - if)'- 

(iv) Two commuting Hermitian quantities /, g arc uncorrelated iff 

Proof, (i) follows from (El) and (E3). 

(ii) The first formula holds since 

((/ - lYig - g)) = ifg) - fig) - {fYg + n = {fg) - {frig}- 

The second formula follows for g — f, using (El), and the third formula is an immediate 
consequence. 

(iii) follows from (El) and (ii). 

(iv) If f,g are Hermitian and commute the fg is Hermitian by Proposition 5.1.3(ii), hence 
(fg) is real. By (ii), cov{f,g) — (fg) — {f){g), and the assertion follows. □ 

Formally, the essential difference between classical mechanics and quantum mechanics in 
the latter's lack of commutativity. While in classical mechanics there is in principle no lower 
limit to the uncertainties with which we can prepare the quantities in a system of interest, 
the quantum mechanical uncertainty relation for noncommuting quantities puts strict limits 
on the uncertainties in the preparation of microscopic states. Here, preparation is defined 
informally as bringing the system into an state such that measuring certain quantities / 
gives numbers that agree with the values (/) to an accuracy specified by given uncertainties. 

We now discuss the limits of the accuracy to which this can be done. 

5.4.4 Proposition. 

(i) Tlie Cauchy-Schwairz inequality 

\{r9)\'<{rf){9*9) 

holds for all f, g & E,. 

(ii) The uncertainty relation 

a{f)Mgy>\^OYif,g)\' + \^{rg-g*f)\' 
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holds for all f,g eE. 
(iii)Forf,geE, 

cov(/, g) = cov(^, /) = i(<7(/ + gf - <j{ff - <j{gf), (5.45) 

|cov(/,^)|<a(/)a(^7), (5.46) 

a{f + 9)<a{f) + a{g). (5.47) 

In particular, 

I ifg) - if) {g) I < (^{f)(7{g) for commuting Hermitian f, g. (5.48) 

Proof, (i) For arbitrary a, /3 e C we have 

< {{af - PgYiaf - Pg)) 

= a*a{rf) - a*P{f*g) - P*a{g*f) + PP*{g*g) 
= la^rf) - 2 Re(«*/3(r^)) + \P\\g*g) 

We now choose P — {f*g): and obtain for arbitrary real a the inequahty 

< a'iff) - 2a\{rg)f + \{rg)f{g*g). (5.49) 

The further choice a — {g*g) gives 

o<{g*gr{rf)-{g*g)\{rg)\'- 

If {g*g) > 0, we find after division by {g*g) that (i) holds. And if {g*g) < then {g*g) — 
and we have {f*g) = since otherwise a tiny a produces a negative right hand side in 

(5.49) . Thus (i) also holds in this case. 

(ii) Since {f — f)* {g — g) — {g — g)* {f — f) — f*g — g*f, it is sufficient to prove the uncertainty 
relation for the case of quantities /, g whose value vanishes. In this case, (i) implies 

{Re{rg)r + ilm{rg)y = \{rg)\' < {rf){g*g) = a{ffa{gf. 

The assertion follows since Ke{f*g) — cov(/, g) and 

il^rg) = \{{rg) - {rg}*) = \{rg - g*f). 

(iii) Again, it is sufficient to consider the case of quantities /, g whose value vanishes. Then 

<^{f + gr = {{f + gr{f + g))-{rf) + {rg + g*f) + {g*g) ...r.. 

= a{fr + 2coy{f,g) + a{gr, ^ ' ' 

and (5.45) follows. (5.46) is an immediate consequence of (ii), and (5.47) follows easily from 

(5.50) and (5.46). Finally, (5.48) is a consequence of (5.46) and Proposition 5.4.3(iii). □ 
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If we apply Proposition 5.4.4(ii) to scalar position q and momentum p variables satisfying 
the canonical commutation relation 

[q,p]^ih, (5.51) 

we obtain 

criqHp) > \n, (5.52) 

the uncertainty relation of Heisenberg [110, 212]. it implies that no state exists where 
both position q and momentum p have arbitrarily small uncertainty. 
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Chapter 6 

The laws of thermodynamics 



This chapter rederives the laws of thermodynamics from statistical mechanics, thus putting 
the phenomenological discussion of Chapter 4 on more basic foundations. 

We confine our attention to a restricted but very important class of Gibbs states, those 
describing thermal states. Wc introduce thermal states by selecting the quantities whose 
values shall act as extensive variables in a thermal model. On this level, we shall be able to 
reproduce the phenomenological setting of the present section from first principles; see the 
discussion after Theorem 6.2.3. If the underlying detailed model is assumed to be known 
then the system function, and with it all thermal properties, are computable in principle, 
although we only hint at the ways to do this numerically. We also look at a hierarchy 
of thermal models based on the same bottom level description and discuss how to decide 
which description levels are appropriate. 

Although dynamics is important for systems not in global equilibrium, we completely ignore 
dynamical issues in this chapter. We take a strictly kinematic point of view, and look as 
before only at a single phase without chemical reactions. In principle, it is possible to 
extend the present setting to cover the dynamics of the nonequilibrium case and deduce 
quantitatively the dynamical laws of nonequilibrium thermodynamics (Beris & Edwards 
[29], Oettinger [186]) from microscopic properties, including phase formation, chemical 
reactions, and the approach to equilibrium; see, e.g., Balian [15], Grabert [97], Rau & 
MULLER [203], Spohn [231]. 



6.1 The zeroth law: Thermal states 



Thermal states are special Gibbs states, used in statistical mechanics to model macroscopic 
physical systems that are homogeneous on the (global, local, microlocal, or quantum) level 
used for modeling. They have all the properties traditionally postulated in thermodynamics. 
While we discuss the lower levels on an informal basis, we consider in the formulas for 
notational simplicity mainly the case of global equilibrium, where there are only finitely 
many extensive variables. Everything extends, however, with (formally trivial but from a 
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rigorous mathematical view nontrivial) changes to local and microlocal equilibrlTim, where 
extensive variables are fields, provided the sums are replaced by appropriate integrals; cf. 
Oettinger [186]. 

In the setting of statistical mechanics, the intensive variables are, as in Section 4.1, num- 
bers parameterizing the entropy and characterizing characterizing a particular system at 
a particular time. To each admissible combination of intensive variables there is a unique 
thermal state providing values for all quantities. The extensive variables then appear as 
the values of corresponding extensive quantities. 

A basic extensive quantity present in each thermal system is the Hamilton energy H; 
it is identical to the Hamiltonian function (or operator) in the underlying dynamical de- 
scription of the classical (or quantum) system. In addition, there are further basic extensive 
quantities which we call Xj (j G J) and collect in a vector X, indexed by J. All other ex- 
tensive quantities arc expressible as linear combinations of these basic extensive quantities. 
The number and meaning of the extensive variables depends on the type of the system; 
typical examples are given in Table 7.1 in Section 7.2. 

In the context of statistical mechanics (cf. Examples 5.1.8), the Euclidean *-algebra E 
is typically an algebra of functions (for classical physics) or linear operators (for quantum 
physics) , and H isa particular function or linear operator characterizing the class of systems 
considered. The form of the operators Xj depends on the level of thermal modeling; for 
further discussion, see Section 7.2. 

For qualitative theory and for deriving semi-empirical recipes, there is no need to know 
details about H or Xj] it suffices to treat them as primitive objects. The advantage we 
gain from this less detailed setting is that a much simpler machinery than that of statistical 
mechanics proper suffices to reconstruct all of phenomenological thermodynamics. 

It is intuitively clear from the informal definition of extensive variables in Section 4.5 that 
the only functions of independent extensive variables that are again extensive can be linear 
combinations, and it is a little surprising that the whole machinery of equilibrium thermo- 
dynamics follows from a formal version of the simple assumption that in thermal states the 
entropy is extensive. We take this to be the mathematical expression of the zeroth law and 
formalize this assumption in a precise mathematical definition. 

6.1.1 Definition. A thermal system is defined by a family of Hermitian extensive 
Vciriables H and Xj (j G J) from a Euclidean *-algebra. A thermal state of a thermal 
system is a Gibbs state whose entropy 5* is a linear combination of the basic extensive 
quantities of the form 



S ^T~^(^H ajXjj = T-\H - a ■ X) (zeroth law of thermodynamics) (6.1) 

jeJ 

with suitable real numbers T ^ and aj {j G J). Here a and X are the vectors with 
components aj {j G J) and Xj (j G J, respectively. 



Thus the value of an arbitrary quantity g is 

g := {g) = /e-^^^-^)^. 



(6.2) 
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where 




(6.3) 



The numbers aj are caUed the intensive variables conjugate to Xj, the number T is 
called the temperature, and P the coldness. S, H, X, T, and a are called the thermal 
veiriables of the system. Note that the extensive variables of traditional thermodynamics 
are in the present setting not represented by the extensive quantities H, Xj themselves 
but by their values S,H,X. 

Since we can write the zeroth law (6.1) in the form 



called the Euler equation, the temperature T is considered to be the intensive variable 
conjugate to the entropy S. 

6.1.2 Remarks, (i) As already discussed in Example 4.1.4 for the case of temperature, 
measuring intensive variables is based upon the empirical fact that two systems in contact 
where the free exchange of some extensive quantity is allowed tend to relax to a joint 
equilibrium state, in which the corresponding intensive variable is the same in both systems. 
If a small measuring device is brought into close contact with a large system, the joint 
equilibrium state will be only minimally different from the original state of the large system; 
hence the intensive variables of the measuring device will move to the values of the intensive 
variables of the large system in the location of the measuring device. This allows to read 
off their value from a cahbrated scale. 

(ii) Many treatises of equilibrium thermodynamics take the possibility of measuring temper- 
ature to be the contents of the zeroth law of thermodynamics. The present, different choice 
for the zeroth law has far reaching consequences. Indeed, as we shall see, the definition 
implies the first and second law, and (together with a quantization condition) the third law 
of thermodynamics. Thus these become theorems rather than separately postulated laws. 

(iii) We emphasize that the extensive quantities H and Xj are independent of the intensive 
quantities T and a, while defined by (6.1), is an extensive quantity defined only when 
values for the intensive quantities are prescribed. From (6.1) it is clear that values also 
depend on the particular state a system is in. It is crucial to distinguish between the 
quantities H or Xj, which are part of the definition of the system but independent of the 
state (since they are independent of T and a), and their values H = (H) or Xj = {Xj), 
which change with the state. 

(iv) In thermodynamics, the interest is restricted to the values of the thermal variables, in 
statistical mechanics, the values of the thermal variables determine a state of the microscopic 
system. In particular, the knowledge of the intensive variables allows one to compute the 
values (6.2) of arbitrary microscopic quantities, not only the extensive ones. Of course, these 
values don't give information about the position and momentum of individual particles but 
only about their means. For example, the mean velocity of an ideal monatomic gas at 
temperature T turns out to be (v) = 0, and the mean velocity-squared is (v^) = SkT. 



H =-TS + a- X, 



(6.4) 
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(v) A general Gibbs state has an incredibly high complexity. Indeed, in the classical case, 
the specification of an arbitrary Gibbs state for 1 mole of a pure, monatomic substance such 
as Argon requires specifying the entropy S, a function of QN^ ~ 36- 10^"^ degrees of freedom. 
In comparison, a global equilibrium state of Argon is specified by three numbers T,p and /x, 
a local equilibrium state by three fields depending on four parameters (time and position) 
only, and a microlocal equilibrium state by three fields depending on seven parameters 
(time, position, and momentum). Thus global, local, and microlocal equilibrium states 
form a small minority in the class of all Gibbs states. It is remarkable that this small class 
of states suffices for the engineering accuracy description of all macroscopic phenomena. 

(vi) Of course, the number of thermal variables or fields needed to describe a system depends 
on the true physical situation. For example, a system that is in local equilibrium only 
cannot be adequately described by the few variables characterizing global equilibrium. The 
question of selecting the right set of extensive quantities for an adequate description is 
discussed in Section 7.2. 

(vii) The formulation (6.1) is almost universally used in practice. However, an arbitrary 
linear combination 

S^'yH + hoXo + ... + hsXs (6.5) 

can be written in the form (6.1) with T = I/7 and aj = —hj/j, provided that 7 7^ 0; 
indeed, (6.5) is mathematically the more natural form, which also allows states of infinite 
temperature that are excluded in (6.1). This shows that the coldness /3 is a more natural 
variable than the temperature T; it figures prominently in statistical mechanics. Indeed, 
the formulas of statistical mechanics are continuous in f3 even for systems such as those 
considered in Example 6.2.5, where f] may become zero or negative. The temperature T 
reaches in this case infinity, then jumps to minus infinity, and then continues to increase. 
According to Landau & Lifschitz [151], states of negative temperature, i.e., negative 
coldness, must therefore be considered to be hotter, i.e., less cold, than states of any positive 
temperature. On the other hand, in the limit T ^ 0, a system becomes infinitely cold, 
giving intuition for the unattainability of zero absolute temperature. 

(viii) In mathematical statistics, there is a large body of work on exponential families, 
which is essentially the mathematical equivalent of the concept of a thermal state over a 
commutative algebra; see, e.g., Barndorff-Nielsen [22]. In this context, the values of 
the extensive quantities define a sufficient statistic, from which the whole distribution can be 
reconstructed (cf. Theorem 6.2.4 below and the remarks on objctive probability in Section 
5.4). This is one of the reasons why exponential families provide a powerful machinery for 
statistical inference; see, e.g., Bernardo & Smith [30]. For recent extensions to quantum 
statistical inference, see, e.g., Barndorff-Nielsen et al. [23] and the references there. 

(ix) For other axiomatic settings for deriving thermodynamics, which provide different 
perspectives, see Caratheodory [51], Haken [107], Jaynes [124], Katz [132], Emch & 
Liu [71], and Lieb & Yngvason [156]. 
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6.2 The equation of state 

Not every combination (T, a) of intensive variables defines a thermal state; the requirement 
that (1) = 1 enforces a restriction of (T, a) to a manifold of admissible thermal states. 

6.2.1 Theorem. Suppose that T > 0. 

(i) For any k, > 0, the system function A dehned by 

A(T, a) := kT log /e"^^^""-^) (6.6) 

is a convex function of T and a. It vanishes only if T and a are the intensive variables of 
a thermal state. 

(a) In a thermal state, the intensive variables are related by the equation of state 

A(r, a) = 0. (6.7) 
The state space is the set of (T, a) satisfying (6.7). 
(Hi) The values of the extensive variables are given by 
- dA _ 

S = n—(T,a), X = n—(T,a) for some Q>0, (6.8) 
oT da 

and the phenomenological Euler equation 

H = TS + a-X. (6.9) 

(iv) Regarding S and X as functions ofT and a, the matrix 



f dS_ dS_\ 

df da 

dX dX 
\df~daJ 



(6.10) 



is symmetric and positive semidefinite; in particular, we have the Mctxwell reciprocity 
relations 



dXi _ dXj dXi _ OS 



daj dai ' dT dai ' 



(6.11) 



and the stability conditions 



|>o, ^>o Oei). (6.12) 



Proof. By Theorem 5.3.2(i), the function cf) defined by 

(j){ao,a) := \og je-i^'oH-a-x) ^ _w(^aoH - a ■ X) 
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is a convex function of ckq and a. Put Vt — k/K. Then, by Proposition 4.1.1, 

A(r, a) = -kTW{P{H -a-X))^ '^'^Kw ^^"^^^ 
is also convex. The condition A(T, a) = is equivalent to 

the condition for a thermal state. This proves (i) and (ii). 

(iii) The formulas for S and X follow by differentiation of (6.13) with respect to T and a, 
using (5.35). Equation (6.9) follows by taking values in (6.4), noting that T and a are real 
numbers. 



(iv) By (iii), the matrix 



gf2 



dfda 



d^A __ 

V dadT 'doP^ ' 

is the Hessian matrix of the convex function A. Hence S is symmetric and positive semidef- 
inite. (6.11) expresses the symmetry of E, and (6.12) holds since the diagonal entries of a 
positive semidefinite matrix are nonnegative. □ 



6.2.2 Remcirks. (i) For T < 0, the same results hold, with the change that A is concave 
instead of convex, E is negative semidefinite, and the inequality signs in (6.12) are reversed. 
This is a rare situation; it can occur only in (nearly) massless systems embedded out of 
equilibrium within (much heavier) matter, such as spin systems (cf. PURCELL & POUND 
[200]), radiation fields in a cavity (cf. Hsu & BARAK AT [114]), or vortices in 2-dimensional 
fluids (cf. Montgomery & Joyce [172], Eyinck & Spohn [74]). A massive thermal 
system couples significantly to kinetic energy. In this case, the total momentum p is an 
extensive quantity, related to the velocity the corresponding intensive variable, by p = 
Mv, where M is the extensive total mass of the system. From (6.8), we find that p = 
VtdA/dv, which implies that A = AI^^q + ^v"^. Since the mass is positive, this expression is 
convex in v, not concave; hence T > 0. Thus, in a massive thermal system, the temperature 
must be positive. 

(ii) In the application, the free scaling constant k is usually chosen as k = k/Vt^ where Vt 
is a measure of system size, e.g., the total volume or total mass of the system. In actual 
calculations from statistical mechanics, the integral is usually a function of the system 
size. To make the result independent of it, one performs the so-called thermodynamic limit 
Vl oo\ thus Vt must be chosen in such a way that this limit is nontrivial. Extensivity in 
single phase global equilibrium then justifies treating O as an arbitrary positive factor. 

In phenomenological thermodynamics (cf. Section 4.1), one makes suitable, more or less 
heuristic assumptions on the form of the system function, while in statistical mechanics. 
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one derives its form from (6.7) and specific choices for the quantities H and X within one 
of the settings described in Example 5.1.8. Given these choices, the main task is then 
the evaluation of the system function (6.6), from which the values of all quantities can be 
computed. (6.6) can often be approximately evaluated from the cumulant expansion (5.33) 
and/or a mean field approximation (5.42). 

An arbitrary Gibbs state is generally not a thermal state. However, we can try to approxi- 
mate it by an equilibrium state in which the extensive variables have the same values. The 
next result shows that the slack (the difference betwwen the left hand side and the right 
hand side) in (6.14), which will turn out to be the microscopic form of the Euler inequality 
(4.2), is always nonnegative and vanishes precisely in equilibrium. Thus it can be used as 
a measure of how close the Gibbs state is to an equilibrium state. 

6.2.3 Theorem. Let (•) be a Gibbs state with entropy S. Then, for arbitrary (T, a) 
satisfying T > and the equation of state (6.7), the values H — {H), S — (S), and 
X = {X) satisfy _ _ _ 

H>TS -a-X. (6.14) 

Equahty only holds if S is the entropy of a thermal state with intensive variables (T, a) . 

Proof. The equation of state imphes that Sc :— T'^{H — a- X) is the entropy of a thermal 
state. Now the assertion follows from Theorem 5.3.3, since {S) < (Sc) — T~^{{H)—a-{X)), 
with equality only if S — Sc- □ 



As the theorem shows, everything of macroscopic interest is deducible from an explicit 
formula for the system function. Hence one can use thermodynamics in many situations 
very successfully as a phenomenological theory without having to bother about microscopic 
details. It suffices that a phenomenological expression for A(T, a) is available. In particular, 
the phenomenological axioms from Section 4.1 now follow by specializing the above to a 
standaird system, characterized by the extensive quantities 

H,Xo = V, X, = Nj{jy^O), (6.15) 

where, as before, V denotes the (positive) volume of the system, and each A^^ denotes 
the (nonnegative) number of molecules of a fixed chemical composition (we shall call these 
particles of kind j). However, H and the Nj are now quantities from E, rather than 
thermal variables. We call 

P := -ao (6.16) 

the pressure and 

pij-.^aj (j^O) (6.17) 
the chemical potential of kind j; hence 

a-X = -PV + II-N. 

Specializing the theorem, we find the phenomenological Euler equation 

H ^TS - PV + fi-N; (6.18) 
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Note that V = V since we took V as system size. For reversible changes, we have the first 
law of thermodynamics 

dH = TdS - PdV + ii-dN (6. 19) 

and the Gibbs-Duhem equation 

= SdT - VdP + N-did. (6.20) 

A comparison with Section 4.1 shows that dropping the bars from the values reproduces 
for T > 0, P > and S > the axioms of phcnomenological thermodynamics, except for 
the extensivity outside equilibrium (which has local equilibrium as its justification). The 
assumption T > was justified in Remark 6.2.2(i), and S >0 will be justified in Section 6.5. 
But there seem to be no theoretical arguments which shows that the pressure of a standard 
system in the above sense must always be positive. (At T < 0, negative pressure is possible; 
see Example 6.2.5.) We'd appreciate getting information about this from readers of this 
book. 

Apart from boundary effects, whose role diminishes as the system gets larger, the exten- 
sive quantities scale linearly with the volume. In the thermodynamic limit, corresponding 
to an idealized system infinitely extended in all directions, the boundary effects disappear 
and the linear scaling becomes exact, although this can be proved rigorously only in simple 
situations, e.g., for hard sphere model systems (Yang & Lee [266]) or spin systems (Grif- 
fiths [100]). A thorough treatment of the thermodynamic limit (e.g., RUELLE [218, 219], 
Thirring [240], or, in the framework of large deviation theory, ELLIS [70]) in general 
needs considerably more algebraic and analytic machinery, e.g., the need to work in place 
of thermal states with more abstract KMS-states (which are limits of sequences of thermal 
states still satisfying a KMS condition (5.19)). Moreover, proving the existence of the limit 
requires detailed properties of the concrete microscopic description of the system. 

For very small systems, typically atomic clusters or molecules, is fixed and a canonical 
ensemble without the /j, ■ N term is more appropriate. For the thermodynamics of small 
systems (see, e.g., (BUSTAMENTE et al. [48], GROSS [102], Kratky [140]) such as a single 
cluster of atoms, V is still taken as a fixed reference volume, but now changes in the physical 
volume (adsorption or dissociation at the surface) are not represented in the system, hence 
need not respect the thermodynamic laws. For large surfaces (e.g., adsorption studies in 
chromatography [131, 169]), a thermal description is achievable by including additional 
variables (surface area and surface tension) to account for the boundary effects; but clearly, 
surface terms scale differently with system size than bulk terms. 

Thus, whenever the thermal description is valid, computations can be done in a fixed 
reference volume Vq which we take as system size Q, and the true, variable volume V can 
always be represented in the Euclidean *-algebra as a real number, so that in particular 
V = V. Then (6.6) implies that, for the reference volume, 

A(r,P,//) = Q-^jSriog(e-^™)/e-^(^-'"^), 



hence 

A(T, P, li) = n~^kT{log Z{T, /i) - PQ) = P{T, /x) - P, (6.21) 
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where 

Z{T, P, At) /e-/3(^-'^-^) (6.22) 
is the so-called grand canonical psirtition function of the system and 

P(r,/x) := n-^kT log Z{T,i^), (6.23) 

while P without argument is the parameter in the left hand side of (6.21). With our 

convention of considering a fixed reference volume and treating the true volume V as the 
scale factor, Q = V (otherwise a thermodynamic limit would be needed), this expression is 
independent of V, since it relates intensive variables unaffected by scaling. The equation 
of state (6.7) therefore takes the form 

P = P(r,/i). (6.24) 

Quantitative expressions for the equation of state can often be computed from (6.22)-(6.23) 
using the cumulant expansion (5.33) and/or a mean field approximation (5.42). Note that 
these relations imply 

Traditionally (see, e.g., GiBBS [90], Huang [115], Reichl [205]), the thermal state corre- 
sponding to (6.21)-(6.23) is called a grand canonical ensemble, and the following results 
are taken as the basis for microscopic calculations from statistical mechanics. 

6.2.4 Theorem. For a standard system in global equilibrium, values of an arbitrary quan- 
tity g can be calculated from (6.22) and 

{g)^Z(T,^i)~'je-P("-^-''^g. (6.25) 

The values of the extensive quantities are given in terms of the equation of state (6.23) by 



dP — dP 



S = V^(T,f,), Nj = V—(T,n) (6.26) 



and the phenomenological Euler equation (6.18). 
Proof. Equation (6.23) implies that 

giving (6.25). The formulas in (6.26) follow from (6.8) and (6.21). □ 



No thermodynamic limit was needed to derive the above results. Thus, everything holds 
- though with large limit resolutions in measurements - even for single small systems 
(BusTAMENTE et al. [48], Gross [102], Kratky [140]). 
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6.2.5 Example. We consider the two level system from Example 5.2.4, using Q = 1 as 
system size. Prom (6.22) and (6.23), we find Z(T, /x) = 1 + e~^/^^, hence 

P(T, ij,) = kT\og{l + e-^/^^) = JSriog(e^/^^ + 1)-E. 

From (6.25), we find 



1 + e^A^ + 1' log(£;/:H^-l)' 

(This implies that a two-level system has negative temperature and negative pressure if 
H > E/2.) The heat capacity C := dH/dT takes the form 

^2 E/kT 

c 



It exhibits a pronounced maximum, the so-called Schottky bump (cf. C ALLEN [49]), 
from which E can be determined. In view of (6.56) below, this allows the experimental 
estimation of the spectral gap of a quantum system. The phenomenon persists to some 
extent for multilevel systems; see Civitarese et al. [55]. 



6.3 The first law: Energy balance 

We now discuss relations between changes of the values of extensive or intensive variables, 
as expressed by the first law of thermodynamics. To derive the first law in full generality, 
we use the concept of reversible transformations introduced in Section 4.1. Corresponding 
to such a transformation, there is a family of thermal states {■)x defined by 

(/)a = /e-^W(--W-)/, PiX) = 

Important: In case of local or microlocal equihbrium, where the thermal system carries 
a dynamics, it is important to note that reversible transformations are ficticious transfor- 
mations which have nothing to do with how the system changes with time, or whether a 
process is reversible in the dynamical sense that both the process and the reverse process 
can be realized dynamically. The time shift is generally not a reversible transformation. 

We use differentials corresponding to reversible transformations; writing f = S/k, we can 
delete the index / from the formulas in Section 5.2. In particular, we write the Kubo inner 
product (5.22) as 

{g;h) := {g;h)s/n. (6.27) 

6.3.1 Proposition. The value g{T, a) := {g{T, a)) of every (possibly T- and a-dependent) 
quantity g{T, a) is a state variable satisfying the differentiation formula 



d{g)^{dg)-{g-g;dS)/k. 



(6.28) 
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Proof. That g is a state variable is an immediate consequence of the zeroth law (6.1) since 
the entropy depends on T and a only. The differentiation formula follows from (5.34) and 
(6.27). □ 



6.3.2 Theorem. For reversible changes, we have the first law of thermodynamics 

dH^TdS + a-dX (6.29) 

and the Gibbs-Duhem equation 

O^SdT + X -da. (6.30) 

Proof. Differentiating the equation of state (6.7), using the chain rule (4.10), and simplifying 
using (6.8) gives the Gibbs-Duhem equation (6.30). If we differentiate the phenomenological 
Euler equation (6.9), we obtain 

dH = TdS + SdT + a-dX + X-da, 

and using (6.30), this simplifies to the first law of thermodynamics. □ 

Because of the form of the energy terms in the first law (6.29), one often uses the analogy 
to mechanics and calls the intensive variables generalized forces, and differentials of 
extensive variables generalized displacements. 

For the Gibbs-Duhem equation, we give a second proof which provides additional insight. 
Since H and X are fixed quantities for a given system, they do not change under reversible 
transformations; therefore 

dH = 0, dX^ 0. 
Differentiating the Euler equation (6.4), therefore gives the relation 

= TdS + SdT + X-da. (6.31) 

On the other hand, S depends expficitly on T and a, and by Corollary 5.1.11, 

{dS) = J e-^'^dS = ^d(^J e"^^^^ = J^dl = 0, (6.32) 

taking values in (6.31) implies again the Gibbs-Duhem equation. By combining equation 
(6.31) with the Kubo product we get information about limit resolutions: 

6.3.3 Theorem. 

(i) Let g be a quantity depending continuously differentiable on the intensive variables T 
and a. Then 

to-5;S-5) = *r(|-(|)), (6.33) 
to-SX,-X,) = ST(^-(||)), (6.34) 
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(a) If the extensive variables H and Xj (j & J) are pairwise commuting then 

_ ffc 
{{S-Sr)^kT—, (6.35) 

ffv 

{iX,-Xj){S-S))^kT^ (jeJ), (6.36) 

gj^ 

{{Xj - Xj){Xk - Xk)) (j, keJ), (6.37) 



res(-S) = W^l^, res{Xj) 



kT dXj 



(6.38) 



Proof. Multiplying the differentiation formula (6.28) by kT and using (6.31), we find, for 
arbitrary reversible transformations, 

kT(d{g) - (dg)) = {g-g; S)dT +{g-g;X)- da. 

Dividing by dX and choosing A = T and X = aj, respectively, gives 

dg / dg 



{g-g;S)=kT[^-{^)), {g-g;X,)=kT 



daj \ daj 



(i) follows upon noting that {g — g; h — h) — {g — g; h) since by (5.25), 

{9-g:h) = {g-g)h = {{g)-g) = Q. 

If the extensive variables H and Xj (j G J) are pairwise commuting then we can use (5.26) 
to eliminate the Kubo inner product, and by choosing g as S and Xj, respectively, we find 
(6.35)-(6.37). The limit resolutions (6.38) now follow from (5.44) and the observation that 

{{9 - gf) = {{g - g)g) - {g - g)g = {{g - g)g) = {g^) - g'^- The limit resolution (6.39) 

follows similarly from 

H\es{Hy = {H -H;H -H) =T{H -H;S -S) + a- {H -H;X -X) 

n 



Note that higher order central moments can be obtained in the same way, substituting 
more complicated expressions for / and using the formulas for the lower order moments to 
evaluate the right hand side of (6.33) and (6.34). 



The extensive variables scale linearly with the system size f2 of the system. Hence, the 
limit resolution of the extensive quantities is 0{^Jh/Vl) in regions of the state space where 
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the extensive variables depend smoothly on the intensive variables. Since ^ is very small, 
they are negligible unless the system considered is very tiny. Thus, macroscopic thermal 
variables can generally be obtained with fairly high precision. The only exceptions are 
states close to critical points where the extensive variables need not be differentiable, and 
their derivatives may therefore become huge. In particular, in the thermodynamic limit 
— > oo, uncertainties are absent except close to a critical point, where they lead to critical 
opacity. 

6.3.4 CoroUeiry. For a standard thermal system, 



Proof. Apply (6.35), (6.37) and (6.41) to a standard system. □ 



^ (6.40) 



Note that res(V") = since we regarded V as the system size, so that it is just a number. 

The above results imply an approximate thermodynamic uncertainty relation 

l^Si^T > kT (6.42) 

for entropy S and the logarithm log T of temperature, analogous to the Heisenberg uncer- 
tainty relation (5.51) for position and momentum, in which the Boltzmann constant k plays 
a role analogous to Planck's constant h. Indeed (GiLMORE [91]), (6.42) can be derived by 
observing that (6.40) may be interpreted approximately as (AS*)^ > kT^; together with 

the first order Taylor approximation AS = |f AT, we find that A^AT = {ASf (^||) ^ > 
kT. A similar argument gives the approximate uncertainty relation 

ANjAi^j > kT. (6.43) 



6.4 The second law: Extremal principles 

The extremal principles of the second law of thermodynamics assert that in a nonthermal 
state, some energy expression depending on one of a number of standard boundary condi- 
tions is strictly larger than that of related thermal states. The associated thermodynamic 
potentials can be used in place of the system function to calculate all thermal variables 
given half of them. Thus, like the system function, thermodynamic potentials give a com- 
plete summary of the equilibrium properties of homogeneous materials. We only discuss 
the Hamilton potential 

U(S,X) max{TS + a-X \ A(T,a) = 0,7 > 0} 
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and the Helmholtz potential 

A(T, X) := max {a-X\ A(T, a) = 0}; 

a 

other potentials can be handled in a similar way. 

6.4.1 Theorem. (Second law of thermodynamics) 

(i) In an arbitrary state, 

H> U(S,X), 

with equality iff the state is a thermal state of positive temperature. The remaining thermal 
variables are then given by 

T^^(S,X), a^^(S,X), (6.44) 

U = H = U(S,X). (6.45) 

In particular, a thermal state of positive temperature is uniquely determined by the values 
of S and X. 

(a) Let T > 0. Then, in an arbitrary state, 

H -TS > A{T,X), 

with equality iff the state is a thermal state of temperature T. The remaining thermal 
variables are then given by 

_ f)A dA 

S^- — {T,X), a^^{T,X), (6.46) 

H ^TS + a-X ^ A{T, X) + TS. (6.47) 

In particular, a thermal state of positive temperature is uniquely determined by the values 
of T and X. 

Proof. This is proved in the same way as Theorem 4.4.1; thus we give no details. □ 



The additivity of extensive quantities is again reflected in corresponding properties of the 
thermodynamic potentials: 

6.4.2 Theorem. 

(i) The function U {S, X) is a convex function of its arguments which is positive homoge- 
neous of degree 1, i.e., for real A, A^, A^ > 0, 

U{XS, XX) = XU(S, X), (6.48) 
U{X^S^ + X''S\ X^X^ + A'X') < X^U{S\ X^) + X^U{S\ X^). (6.49) 

(a) The function A{T, X) is a convex function ofX which is positive homogeneous of degree 
1, i.e., for real X, X^, X^ > 0, 

A{T,XX)^XA{T,X), (6.50) 
A{T, A'X' + A'Z") < X'A{T, X^) + X''A{T, X') . (6.51) 



6.5. THE THIRD LAW: QUANTIZATION 



129 



Proof. This is proved in the same way as Theorem 4.4.2; thus we give no details. 



□ 



The extremal principles imply energy dissipation properties for time-dependent states. 
Since the present kinematical setting does not have a proper dynamical framework, it is 
only possible to outline the implications without going much into details. 

6.4.3 Theorem. 

(i) For any time-dependent system for which S and X remain constant and which converges 
to a thermal state with positive temperature, the Hamilton energy (H) attains its global 
minimum in the limit t ^ oo. 

(a) For any time-dependent system maintained at fixed temperature T > 0, for which X 
remains constant and which converges to a thermal state, the Helmholtz energy {H — TS) 
attains its global minimum in the limit t ^ oo. 

Proof. This follows directly from Theorem 6.4.1. □ 

This result is the shadow of a more general, dynamical observation (that, of course, cannot 
be proved from kinematic assumptions alone but would require a dynamical theory). Indeed, 

it is a universally valid empirical fact that in all natural time-dependent processes, energy is 
lost or dissipated, i.e., becomes macroscopically unavailable, unless compensated by energy 
provided by the environment. Details go beyond the present framework, which adopts a 
strictly kinematic setting. 



The third law of thermodynamics asserts that the value of the entropy is always nonnegative. 
But it cannot be deduced from our axioms without making a further assumption, as a simple 
example demonstrates. 

6.5.1 Example. The algebra E = C"* with pointwise operations is a Euchdean *-algebra 
for any integral of the form 



6.5 The third law: Quantization 




n=l 



the axioms are trivial to verify. For this integral the state defined by 




n=l 



is a state with entropy S given by Sn — Ti log Wn. The value of the entropy 




n=l 



n=l 
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is negative if we choose the Wn such that H'^^n < 1- 



Thus, we need an additional condition which guarantees the vahdity of the third law. Since 
the third law is also violated in classical statistical mechanics, which is a particular case 
of the present setting, we need a condition which forbids the classical interpretation of our 
axioms. 

We take our inspiration from a simple information theoretic model of states discussed in 
Section 7.6 below, which has this property. (Indeed, the third law is a necessary requirement 
for the interpretation of the value of the entropy as a measure of internal complexity, as 
discussed there.) There, the integral is a sum over the components, and, since functions 
were defined componentwise, 

im = T.P(fn). (6.52) 

neAf 

We say that a quantity / is quantized iff (6.52) holds with a suitable spectrum {/„ | n e 

A/"} for all functions F for which F[f) is strongly integrable; in this case, the /„ are called 
the levels of /. For example, in the quantum setting all trace class linear operators are 
quantized quantities, since these always has a discrete spectrum. 

Quantization is the additional ingredient needed to derive the third law: 

6.5.2 Theorem. (Third law of thermodynamics) 

If the entropy S is quantized then S > 0. Equality holds iff the entropy has a single level 
only (\J\f\ = i;. 



Proof. We have 

neAf neAf 

where all p„ = e~'^"/^ > 0, and 

S^JSp^ JSe-'/^ = J2 Sne-'-'^ = ^ ^-Pn- (6.54) 

nGA/" neAT 

If AT = {n} then (6.53) implies p„ = 1, ^„ = Q, and (6.54) gives = 0_ And if \U\ > 1 
then (6.53) gives pn < 1, hence > for all n e J\f, and (6.54) implies 5* > 0. □ 



In quantum chemistry, energy H, volume V, and particle numbers Ni, . . . , Ng form a quan- 
tized family of pairwise commuting Hermitian variables. Indeed, the Hamiltonian H has 
discrete energy levels if the system is confined to a finite volume, y is a number, hence has 
a single level only, and Nj counts particles hence has as levels the nonncgative integers. As 
a consequence, the entropy S = T~^{H + PV — fj, ■ N) is quantized, too, so that the third 
law of thermodynamics is valid. The number of levels is infinite, so that the value of the 
entropy is positive. 

A zero value of the entropy (absolute zero) is therefore an idealization which cannot be 
realized in practice. But Theorem 6.5.2 implies in this idealized situation that entropy and 
hence the joint spectrum of {H, V, Ni, . . . , Ns) can have a single level only. 
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This is the situation discussed in ordinary quantum mechanics (pure energy states at fixed 
particle numbers). It is usually associated with the limit T — > 0, though at absolute tem- 
perature T = 0, i.e., infinite coldness P, the thermal formalism fails (but low T asymptotic 
expansions are possible) . 

To see the behavior close to this limit, we consider for simplicity a canonical ensemble with 
Hamiltonian H (Example 5.2.4); thus the particle number is fixed. Since S is quantized, the 
spectrum of H is discrete, so that there is a finite or infinite sequence Eq < Ei < E2 < . . . 
of distinct energy levels. Denoting by P„ the (rank dn) orthogonal projector to the d„- 
dimensional eigenspace with energy E^ we have the spectral decomposition 

n>0 

for arbitrary functions (p defined on the spectrum. In particular. 

The partition function is 

Z = tr e-^^ = J2 tr P„. = e"^^"^^n. 

As a consequence. 



e 



(6.55) 



hence values take the form 

(/) = Je-^/V = j( g;,„„_,„2 ). 

From this representation, we see that only the energy levels E^ with 

En<Eo + 0{kT) 

contribute to a canonical ensemble of temperature T. If the temperature T is small enough, 
so that kT <^ E2 — Eq, the exponentials e-Pi^n-Eo) .^-^.j^ n > 2 can be neglected, and we 
find 

p ' » = 



" do + e-^(E^-Eo)d, do ' do(e^(^^-^»)rfo + rfi)' ^^'^^^ 
Thus, the system behaves essentially as the two level system discussed in Examples 5.2.4 and 
6.2.5; the spectral gap E^ — Eq takes the role of E. In particular, if already kT -C Ei — Eq, 
we find that 

e-^/^ = do'Po + 0(e-^(^i-^°)) do'Po (if kT <^ Ei - Eo) 



is essentially the projector to the subspace of minimal energy, scaled to ensure trace one. 
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In the nondegenerate case, where the lowest energy eigenvalue is simple, there is a cor- 
responding normalized eigenvector -0, unique up to a phase, satisfying the Schrodinger 
equation 

Hi; = Eo^J, \'ip\ = l (Eo minimal). (6.57) 
In this case, the projector is Pq = ipip* and has rank do = 1. Thus 

e-^/^ = V'V'* + 0(e-^(-^i-^°)). 

has almost rank one, and the value takes the form 

(g) = tr e-^/^g ^ tr V'V'*^/ = (6-58) 

In the terminology of quantum mechanics, Eq is the ground state energy, the solution 
■0 of (6.57) is called the ground state, and 

{g) = (6.59) 

is the expectation of the observable g in the ground state. 

For a general state vector ijj normalized to satify = 1, the formula (6.59) defines the 
values in the pure state ■0- It is easily checked that (6.59) indeed defines a state in the 
sense of Definition 5.2.1. These are not Gibbs states, but their idealized limiting cases. 

Our derivation therefore shows that - unless the ground state is degenerate - a canonical 
ensemble at sufficiently low temperature is in an almost pure state described by the quantum 
mechanical ground state. 

Thus, the third law directly leads to the conventional form of quantum mechanics, which can 
therefore be understood as the low temperature limit of thermodynamics. It also indicates 
when a quantum mechanical description by a pure state is appropriate, namely always when 
the gap between the ground state energy and the next energy level is significantly larger 
than the temperature, measured in units of the Boltzmann constant. (This is the typical 
situation in most of quantum chemistry and justifies the use of the Born-Oppenheimer 
approximation in the absence of level crossing; cf. Smith [230], Yarkony [267]). Moreover, 
it gives the correct (mixed) form of the state in case of ground state degeneracy, and the 
form of the correction terms when the energy gap is not large enough for the ground state 
approximation to be valid. 



Chapter 7 

Models, statistics, and measurements 



In this chapter, we discuss the relation between models and reality. By necessity, the ratio 
between the number of words and the number of formulas is higher than in other chapters. 
Also, this topic is difficult and to some extent controversial since it touches on unresolved 
foundational issues about the meaning of probability and the interpretation of quantum 
mechanics. 

We discuss in more detail the relation between different thermal models constructed on the 
basis of the same Euclidean *-algebra by selecting different lists of extensive quantities. 

Moreover, the abstract setting introduced in the previous chapters is given both a de- 
terministic and a statistical interpretation through a careful discussion of the meaning of 
uncertainty and probability. 

The interpretation of probability, statistical mechanics, and - today intrinsically interwoven 
- of quantum mechanics has a long history, which lead to a huge number of publications. 
Informative sources for the foundations of probability in general include Fine [78] and 
Hacking [104]. For statistical mechanics, see Ehrenfest [68], ter Haar [239], Penrose 
[191], Sklar [228], Grandy [264], and by Wallace [254]. For the foundations of quan- 
tum mechanics, see the reprint collection by Wheeler & Zurek [258] and Stapp [232], 
Ballentine [18], Home & Whitaker [113], Peres & Terno [195], Schlosshauer 
[222]. 

7.1 Description levels 

To be able to apply the theory dcvcloppcd in the previous chapters, it is necessary to know 
how quantities and states are related to reality. There is no fully objective way of defining 
this relation, since objectivity is always restricted to concepts, and these are imposed upon 
nature by us observers. The same object may be described from different perspectives and 
at different levels of faithfulness. 

Therefore, the observer modeling a particular situation must make some basic choices, and 
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these choices are subjective: Different observers may choose to study different materials 
or different experiments, or they may study the same material or the same experiment in 
different levels of detail, or drawing the system boundary differently For example, one 
observer may regard a measuring instrument as part of the system of interest, while for 
another observer it only serves as recording device. 

We shall sec that once the basic choices are made that unambiguously specify the system of 
interest, everything else can be described objectively. On the other hand, silently changing 
the definition of what constitutes the system of interest is a major reason for apparent 
paradoxes discussed in the literature, and it requires care to disentangle the problems 
involved and to arrive at a clear view. 

In practice, relevant quantities and corresponding states are assigned to real life situations 
by well-informed and experimentally testable judgment concerning one's equipment. In- 
deed, from a practical point of view, theory defines what an object is: A gas is regarded as 
an ideal gas if it behaves to a satisfactory degree like the mathematical model of an ideal 
gas certain values of temperature, pressure and volume. Similarly, a solid is regarded as a 
crystal if it behaves to a satisfactory degree like the mathematical model of a crystal for 
suitable numerical values of the mode Iparameters^ . In general, we know or assume on the 
basis of past experience, claims of manufacturers, etc. that certain materials or machines 
reliably produce states that, to a satisfactory degree for the purpose of the experiment or 
application, depend only on variables that are accounted for in our theory and that are, 
to a satisfactory degree, either fixed or controllable. The nominal state of a system can be 
checked and, if necessary, corrected by calibration, using appropriate measurements that 
reveal the parameters characterizing the state. 

In the preceding chapter, we assumed a fixed selection of extensive quantities defining the 
thermal model. However, as indicated at the end of Section 4.1, observable differences 
from the conclusions derived from a thermal model imply that one or more conjugate pairs 
of thermal variables are missing in the model. So, how should the extensive quantities 
be selected? We first emphasize the flexibihty of the thermal setting. While the zeroth 
law may look very restrictive at first sight, by choosing a large enough family of extensive 
quantities the entropy of an arbitrary Gibbs state can be approximated arbitrarily well by a 
linear combination of these quantities. This does not solve the selection problem but gives 
a useful perspective: 

The zeroth law appears simply as an embodiment of Ockham's razor^ [187], freely para- 
phrased in modern form: that we should opt for the most economic model explaining a 
phenomenon - by restricting attention to the relevant extensive quantities only. At each 
time t, there is - except in degenerate cases - a single Gibbs state, with entropy S{t), 
say, which best describes the system under consideration at the chosel level of modeling. 
Assuming the description by the Gibbs state as fundamental, its value is the objective, 
true value of the entropy, relative only to the algebra of quantities chosen to model the 

^ cf. Callen [49, p. 15]: "Operationally, a system is in an equilibrium state if its properties are consis- 
tently described by thermodynamic theory." 

^ frustra fit per plura quod potest fieri per pauciora 
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system. A description of the state in terms of a thermal system is therefore adequate if 
(and, under an observabihty quahfication to be discussed below, only if), for all relevant 
times t, the entropy S{t) can be adequately approximated by a linear combination of the 
extensive quantities available at the chosen level of description. 

The set of extensive variables depends on the application and on the desired accuracy of 
the model; it must be chosen in such a way that knowing the measured values of the exten- 
sive variables determines (to the accuracy specified) the complete behavior of the thermal 
system. Thus, the choice of extensive variables is (to the accuracy specified) completely de- 
termined by the level of accuracy with which the thermal description should fit the system's 
behavior. This forces everything else: The theory must describe the freedom available to 
characterize a particular thermal system with this set of extensive variables, and it must 
describe how the numerical values of interest can be computed for each state of each thermal 
system. 

In contrast to the information theoretic approach where the choice of extensive quantities 
is considered to be the subjective matter of which observables an observer happens to have 
knowledge of, the only subjective aspect of our setting is the choice of the resolution of 
modeling. This fixes the amount of approximation tolerable in the ansatz, and hence the 
necessary list of extensive quantities. (Clearly, physics cannot be done without approxima- 
tion, and the choice of a resolution is unavoidable. To remove even this trace of subjectivity, 
inherent in any approximation of anything, the entropy would have to be represented with- 
out any approximation, which would require to use the algebra of quantities of the still 
unknown theory of everything, and to demand that the extensive quantities exhaust this 
algebra.) 

In general, which quantities need to be considered depends on the resolution with which 
the system is to be modeled - the higher the resolution the larger the family of extensive 
quantities. Thus - whether we describe bulk matter, surface effects, impurities, fatigue, 
decay, chemical reactions, or transition states, - the thermal setting remains the same since 
it is a universal approximation scheme, while the number of degrees of freedom increases 
with increasingly detailed models. 

In phenomenological thermodynamics, the relevant extensive quantities are precisely those 
variables that are observed to make a difference in modeling the phenomenon of interest. 
Table 7.1 gives typical extensive variables, their intensive conjugate variables, and their 
contribution to the Euler equation^. Some of the extensive variables and their intensive 

^ The Euler equation, which contains the energy contributions specified in the table, looks like an energy 
balance. But since S is undefined, this formal balance has no contents apart from defining the entropy S in 
terms of the energy and other contributions. The energy balance is rather given by the first law discussed 
later, and is about changes in energy. Conservative work contributions are exact difi^erentials. For example, 
the mechanical force F = —dV{q)/dq translates into the term —F-dq — dV{q) of the first law, corresponding 
to the term —F ■ q in the Euler equation. The change of the kinetic energy E^j^jn = mv'^/2 contribution of 
linear motion with velocity v and momentum p = mv is dE]^^^ = d{mv'^/2) = mv ■ dv = v ■ dp, which is 
exactly what one gets from the v ■ p contribution in the Euler equation. Since v-p = mv^ is larger than the 
kinetic energy, this shows that motion implies a contribution to the entropy of {E\^i^^ — v-p)/T = — mu^/2T. 
A similar argument applies to the angular motion of a rigid body in its rest frame, providing the term 
involving angular velocity and angular momentum. 
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Table 7.1: Typical conjugate pairs of thermal variables and their contribution to the Euler 
equation. The signs are fixed by tradition. (In the gravitational term, m is the vector with 
components rrij, the mass of a particle of kind j, g the acceleration of gravity, and h the 
height.) 



extensive Xj 


intensive aj 


contribution ajXj 


entropy S 


temperature T 


thermal, TS 


particle number Nj 
conformation tensor C 


chemical potential fij 
relaxation force R 


chemical, iJ^jNj 
conformational ^ Rj^C^'^ 


strain e^^ 
volume V 
surface ^4^ 
lenght L 
displacement q 
momentum p 
angular momentum J 


stress Gjk 
pressure —P 
surface tension 7 
tension J 
force —F 
velocity v 
angular velocity Q, 


elastic, Y^cFjkS^^ 
mechanical, —PV 

mechanical, 7A5 
mechanical, JL 
mechanical, —F ■ q 
kinetic, v ■ p 
rotational, 0, ■ J 


charge Q 
polarization P 
magnetization M 
electromagnetic field F 


electric potential $ 
electric field strength E 
magnetic field strength B 
electromagnetic field strength —F^ 


electrical, 
electrical, E ■ P 
magnetical, B ■ M 
electromagnetic, — Yl, Pfiv^'^^ 


mass M = m - N 
energy-momentum U 


gravitational potential gh 
metric g 


gravitational, ghM 
gravitational, ^guvU^" 



conjugates are vectors or (in elasticity theory, the theory of complex fluids, and in the rela- 
tivistic case) tensors; cf. Balian [17] for the electromagnetic field and Beris & Edwards 
[29], Ottinger [186] for complex fiuids. 

To analyze the relation between two different thermal description levels, we compare a coarse 
system and a more detailed system quantitatively, taking for simplicity the temperature 
constant, so that the T-dependence can be suppressed in the formulas, and states are 
completely determined by a. 

The fine system will be written as before, the variables and quantities associated with the 
coarser system get an additional index c. In order to be able to compare the two systems, 
we assume that one is a refinement of the other, so that the extensive quantities of the 
coarse system are X^. — CX, with a fixed matrix C with linearly independent rows, whose 
components tell how the components of Xc are built from those of X. The entropy of the 
coarse system is then given by 

Sc = T-\H - etc • Xc) = T-\H - ac ■ CX) = T-\H - a* ■ X), 

where 

a* = CV- (7.1) 
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Thus, the thermal states of the coarse model are simply the states of the detailed model 
for which the intensive parameter vector a is of the form a = C^a^ for some ac- Thus the 
coarse state space can simply be viewed as a lower- dimensional subspace of the detailed 
state space. Therefore one expects the coarse description to be adequate precisely when 
the detailed state is close to the coarse state space, with an accuracy determined by the 
desired fidelity of the coarse model. Since the relative entropy (5.41), 

{Sc -S) = {T-\H - ac ■ CX) - T-\H - a ■ X)) = {T-\a - C*ac) ■ X), (7.2) 

measures the amount of information in the detailed state which cannot be explained by the 
coarse state, it is sensible to associate to an arbitrary detailed state a the coarse state etc 
determine by minimizing (7.2). If a* — C^ac ~ a then 

Sc = T-\H -a* -X)^ T-\H - a* ■ X) = S, 

and the coarse description is adequate. If a* 76 a, there is no a priori reason to trust the 
coarse model, and we have to investigate to which extent its predictions will significantly 
differ from those of the detailed model. One expects the differences to be significant; how- 
ever, in practice, there are difficulties if there are limits on our ability to prepare particular 
detailed states. The reason is that the entropy and chemical potentials can be prepared and 
measured only by comparison with states of sufficiently known states. A first sign of this 
is the gauge freedom in ideal gases discussed in Example 4.1.4, which implies that different 
models of the same situation may have nontrivial differences in Hamilton energy, entropy, 
and chemical potential. This ambiguity persists in more perplexing situations: 

7.1.1 Example. (The Gibbs paradox) 

Suppose that we have an ideal gas of two kinds j — 1,2 oi particles which are experimentally 
indistinguishable. Suppose that in the samples available for experiments, the two kinds are 

mixed in significantly varying proportions Ni : N2 = qi : q2 which, by assumption, have 
no effect on the observable properties; in particular, their values are unknown but varying. 
The detailed model treats them as distinct, the coarse model as identical. Reverting to the 
bar less notation of Section 4.1, we have 















1' "=l 













and, assuming C = ( ^ ^ ^ ] for suitable Ci, C2 > 0, 

\\J Cl C2 J 

From the known proportions, we find 

cigi + C2g2 

The mixture behaves like an ideal gas of a single kind, hence 

PV^kTN,, H^K{T)N,, pi, ^kT log " 
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Now Nc = {kT)~^PV = — ^^j^c implies that xi + X2 = 1. Because of indistin- 

guishability, this must hold for any choice of qi,q2 > 0; for the two choices gi = and 
q2 = 0, we get Ci = C2 = 1, hence Nc = '^Nj, and the Xj are mole fractions. Similarly, 
if we use for all kinds j the same normalization for fixing the gauge freedom, the rela- 
tion hc{T)Nc — H — Y2hj{T)Nj — J2hj{'^)^j^c implies for varying mole fractions that 
hj{T) = hc{T) for j = 1, 2. Prom this, we get nj{T) = nc{T) for j = 1, 2. Thus 

H-Hc = Y^ hj{T)Nj - K{T)Nc = 0, 
II j - Hc = ST log — RTlog — — = STlogXj, 

V TTj V 77 c 

the Gibbs energy satisfies 

G -Gc^^ l^jNj - iicNc = ^{l^j - l^c)Nj = HTNc ^ Xj logXj, 

and the entropy satisfies 

S-Sc = T-\H-PV + G)-T-\Hc-PV + Gc) 
= T-\G - Gc) = m E log xj. 

This term is called the entropy of mixing. Its occurence is referred to as Gibbs paradox 
(cf. Jaynes [125], Tseng & Caticha [244], Allahverdyan & Nieuwenhuizen [7], 
Uffink [246, Section 5.2]). It seems to say that there are two different entropies, depending 
on how we choose to model the situation. For fixed mole fractions, the paradox can be 
resolved upon noticing that the fine and the coarse description differ only by a choice of 
gauge; the gauge is unobservable anyway, and the entropy is determined only when the 
gauge is fixed. 

However, if the mole fractions vary, the fine and the coarse description differ significantly. 
If the detailed model is correct, the coarse model gives a wrong description for the en- 
tropy and the chemical potentials. However, this difference is observable only if we know 
processes which affect the different kinds differently, such as a difference in mass, which 
allows a mechanical separation, in molecular size or shape, which allows their separation by 
a semipermeable membrane, in spin, which allows a magnetic separation, or in scattering 
properties of the particles, which may allow a chemical or radiation-based differentiation. 
In each of these cases, the particles become distinguishable, and the coarse description 
becomes inadequate. 

If we cannot separate the kinds to some extent, we cannot prepare equilibrium states at 
fixed mole fraction. But this would be necessary to calibrate the chemical potentials, since 
fixed chemical potentials can be prepared only through chemical contact with substances 
with known chemical potentials, and the latter must be computed from mole fractions. 

Generalizing from the example, we conclude that even when both a coarse model and a more 
detailed model are faithful to all experimental information possible at a given description 
level, there is no guarantee that they agree in the values of all thermal variables of the coarse 
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model. In the language of control theory (see, e.g., Ljung [159]), agreement is guaranteed 
only when all parameters of the more detailed models are observable. 

On the other hand, all observable state functions of the detailed system which depend only 
on the coarse state must and will have the same value within the experimental accuracy 
if both models are adequate descriptions of the situation. Thus, while the values of some 
variables need not be experimentally determinable, the validity of a model is an objective 
property. Therefore, preferences for one or the other of two valid models can only be based 
on other criteria. The criterion usually employed in this case is Ockham's razor^, although 
there may be differences of opinion on what counts as the most economic model. In partic- 
ular, a fundamental description of macroscopic matter by means of quantum mechanics is 
hopelessly overspecified in terms of the number of degrees of freedom needed for compar- 
ison with experiment, most of which are in principle unobservable by equipment made of 
ordinary matter. But it is often the most economical model in terms of description length 
(though extracting the relevant information from it may be difficult). Thus, different people 
may well make different rational choices, or employ several models simultaneously. 

The objectivity of a model description implies that, as soon as a discrepancy with ex- 
periment is reliably found, the model must be replaced by a more detailed (or altogether 
different) model. This is indeed what happened with the textbook example of the Gibbs 
paradox situation, ortho and para hydrogen, cf. Bonhoeffer & Harteck [41], Farkas 
[76]. Hydrogen seemed at first to be a single substance, but then thermodynamic data 
forced a refined description. 

Similarly, in spin echo experiments (see, e.g., Hahn [105, 106], Rothstein [216], Ridder- 
BOS & Redhead [209]), the specially prepared system appears to be in equihbrium but, 
according to Callen's empirical definition-'- it is not - the surprising future behavior (for 
someone not knowing the special preparation) shows that some correlation variables were 
neglected that arc needed for a correct description. Indeed, everywhere in science, we strive 
for explaining surprising behavior by looking for the missing variables needed to describe 
the system correctly! 

Grad [98] expresses this as "the adoption of a new entropy is forced by the discovery of new 
information" . More precisely, the adoption of a new model is forced, since the old model 
is simply wrong under the new conditions and remains valid only under some restrictions. 
Thus this is not a property of entropy alone, but of all concepts in models of reality relating 
to effects not observable (in the sense of control theory discussed above) . 

Observability issues aside, the coarser description usually has a more limited range of appli- 
cability; with the qualification discussed in the example, it is generally restricted to those 
systems whose detailed intensive variable vector a is close to the subspace of vectors of the 
form C^a reproducible in the coarse model. Finding the right family of thermal variables 
is therefore a matter of discovery, not of subjective choice. This is further discussed in 
Section 7.2. 
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7.2 Local, microlocal, and quantum equilibrium 

As we have seen in Section 7.1, when descriptions on several levels are justified empiri- 
cally, they differ significantly only in quantities which are negligible in the more detailed 
models, or by terms which are not observable in principle. Thus, the global equilibrium 
description is adequate at some resolution if and only if only small nonequilibrium forces 
are present, and a more detailed local equilibrium description will (apart from variations of 
the Gibbs paradox which should be cured on the more detailed level) agree with the global 
equilibrium description to the accuracy within which the differences in the corresponding 
approximations to the entropy, as measured by the relative entropy (5.41), are negligible. 
Of course, if the relative entropy of a thermal state relative to the true Gibbs state is large 
then the thermal state cannot be regarded as a faithful description of the true state of the 
system, and the thermal model is inadequate. 

In statistical mechanics proper (where the microscopic dynamics is given), the relevant 
extensive quantities are those whose values vary slowly enough to be macroscopically ob- 
servable at a given spatial or temporal resolution (cf. Balian [15]). Which ones must 
be included is a difficult mathematical problem which has been solved only in simple sit- 
uations (such as monatomic gases) where a weak coupling limit applies. In more general 
situations, the selection is currently based on phenomenological consideration, without any 
formal mathematical support. 

In equilibrium statistical mechanics, which describes time- independent, global equilibrium 
situations, the relevant extensive quantities are the additive conserved quantities of a mi- 
croscopic system and additional parameters describing order parameters that emerge from 
broken symmetries or various defects not present in the ideal model. Phase equilibrium 
needs, in addition, copies of the extensive variables (e.g., partial volumes) for each phase, 
since the phases are spatially distributed, while the intensive variables are shared by all 
phases. Chemical equilibrium also accounts for exchange of atoms through a fixed list 
of permitted chemical reactions whose length is again determined by the desired resolution. 

In states not corresponding to global equihbrium - usually called non-equilibrium states, 
a thermal description is still possible assuming so-called local equilibrium. There, the 
natural extensive quantities are those whose values are locally additive and slowly varying 
in space and time and hence, reliably observable at the scales of interest. In the statistical 
mechanics of local equilibrium, the thermal variables therefore become space- and time- 
dependent fields (Robertson [212]). On even shorter time scales, phase space behavior 
becomes relevant, and the appropriate description is in terms of microlocal equilibrium 
and position- and momentum-dependent phase space densities. Finally, on the microscopic 
level, a linear operator description in terms of quantum equilibrium is needed. 

The present formalism is still applicable to local, microlocal, and quantum equilibrium 
(though most products now become inner products in suitable function spaces), but the 
relevant quantities are now time-dependent and additional dynamical issues (relating states 
at different times) arise which are outside the scope of the present book. 

In local equilibrium, one needs a hydrodynamic description by Navier-Stokes equations and 
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their generalizations; see, e.g., Beris & Eswards [29], Oettinger [186], Edwards et al. 
[67]. In the local view, one gets the interpretation of extensive variables as locally conserved 
(or at least slowly varying) quantities (whence additivity) and of intensive variables as 
parameter fields, which cause non-equilibrium currents when they are not constant, driving 
the system towards global equilibrium. In microlocal equilibrium, one needs a kinetic 
description by the Boltzmann equation and its generalizations; see, e.g., Bornath et al. 
[43], Calzetta & Hu [50], Muller & Ruggeri [177]. 

Quantum equilibrium. Full microscopic dynamics must be based on quantum mechanics. 
In quantum equilibrium, the dynamics is given by quantum dynamical semigroups. We 
outline the ideas involved, in order to emphasize some issues which are usually swept under 
the carpet. 

Even when described at the microscopic level, thermal systems of sizes handled in a lab- 
oratory are in contact with their environment, via containing walls, emitted or absorbed 
radiation, etc.. We therefore embed the system of interest into a bigger, completely isolated 
system and assume that the quantum state of the big ^stem is described at a fixed time by 
a normalized wave function ip in some Hilbert space H. (Assuming instead a mixed state 
given by a density operator would not alter the picture significantly.) The value of a linear 
operator g in the big system is 

(g) = rg^; (7.3) 

cf. (6.58). The small system is defined by a Euclidean *-algebra E of linear operators densely 
defined on H, composed of all meaningful expressions in field operators at arguments in the 
region of interest, with integral given by the trace in the big system. Since (7.3), restricted 
to g E K, satisfies the rules (R1)-(R4) for a state, the big system induces on the system 
of interest a state. By standard theorems (see, e.g., Thirring [240]), there is a unique 
density operator p e E such that (g) = j pg for all gf G E with finite value. Moreover, p 
is Hermitian and positive semidefinite. If is not an eigenvalue of p (which will usually be 
the case), then (•) is a Gibbs state with entropy S — —klogp. To put quantum equilibrium 
into the thermal setting, we need to choose as extensive variables a family spanning the 
algebra E; then each such S can be written in the form (6.1). 

Of course, ip and hence the state (•) depend on time. If the reduced system were goverened 
by a Schrodinger equation then p would evolve by means of a unitary evolution; in particular, 
S = {S) = —ktrplogp would be time-independent. However, the system of interest does 
not inherit a Schrodinger dynamics from the isolated big system; rather, the dynamics of 
p is given by an integro-differential equation with a complicated memory term, defined by 
the projector operator formahsm described in detail in Grabert [97]; for summaries, see 
Rau & Muller [203] and Balian [15] . In particular, one can say nothing specific about 
the dynamics of S. 

In typical treatments of such reduced descriptions, one assumes that the memory decays 
sufficiently fast; this so-called Markov assumption can be justified in a weak coupling 
limit (Davies [148], Spohn [231]), which corresponds to a system of interest nearly inde- 
pendent of the environment. But a typical thermal system, such as a glass of water on 
a desk is held in place by the container. Considered as a nearly independent system, the 
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water would behave very differently, probably diffusing into space. Thus, it is qTicstionable 
whether the Marlcov assumption is satisfied; a detailed investigation of the situation would 
be highly desirable. Apparently there are only few discussions of the problem how contain- 
ers modify the dynamics of a large quantum system; see, e.g., Lebowitz & Frisch [155], 
Blatt [32] and Ridderbos [208]. One should expect a decoherence effect (Brune et 
al. [73]) of the environment on the system which, for large quantum systems, is extremely 
strong (ZUREK [270]). A fundamental derivation should be based on quantum field theory; 
the so-called exact renormalization group equations (see, e.g., PoLONYl & Sailer [198], 
Berges [28]) have a thermal flavor and might be a suitable starting point. 

However, simply assuming the Markov assumption as the condition for regarding the sys- 
tem of interest to be effectively isolated allows one to deduce for the resulting Markov 
approximation a deterministic differential equation for the density operator. The dy- 
namics then describes a linear quantum dynamical semigroup. All known linear quantum 
dynamical semigroup semigroups (cf. Davies [148]) on a Hilbert space correspond to a 
dynamics in the form of a Lindblad equation 



(LiNDBLAD [157], GORINI et al. [95]), where the effective Hamiltonian if is a not 
necessarily Hermitian operator and P* is the dual of a completely positive map P of the 
form 



with some linear operator Q from E to a second *-algebra E' and some *-algebra homomor- 
phism J from E to E'. (Stinespring [235], Davies [148, Theorem 2.1]). Their dynamics 
is inherently dissipative; for time t — > cxo, P*p can be shown to tend to zero, which usually 
implies that, apart from a constant velocity, the limiting state is a global equilibrium state. 

Thus, the irreversibility of the time evolution is apparent already at the quantum level, 
being caused by the fact that all our observations are done in a limited region of space. 
The prevalence here on earth of matter in approximate equilibrium could therefore possibly 
be explained by the fact that the earth and with it most of its materials are extremely old. 

For a system reasonably isolated (in the thermodynamical sense) from its environment, one 
would expect H to contain a confining effective potential well and P to be small. It would 
be interesting to understand the conditions (if there are any) under which the dissipation 
due to P can be neglected. 

We now consider relations within the hierarchy of the four levels. The quantum equilibrium 
entropy Sq^, the microlocal equilibrium entropy Sjoi, the local equilibrium entropy S'lc, the 
global equilibrium entropy Sgi denote the values of the entropy in a thermal description of 
the corresponding equilibrium levels. The four levels have a more and more restricted set 
of extensive quantities, and the relative entropy argument of Theorem 5.3.3 can be applied 
at each level. Therefore 



p=UpH-H*p) + P*p 



(7.4) 



P{f) = Q*J{f)Q for all / e E, 




(7.5) 
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In general the four entropies might have completely different values. There are four essen- 
tially different possibilities, 

(i) 'S'qu ~ Sml ~ jS'ic ~ Sgl, 



with different physical interpretations. As we have seen in Section 7.1, a thermal description 
is valid only if the entropy in this description approximates the true entropy sufficiently 
well. All other entropies, when significantly different, do not correspond to a correct de- 
scription of the system; their disagreement simply means failure of the coarser description 
to match reality. (Again, we disregard variations of the Gibbs paradox which should be 
cured on the fundamental level.) Thus which of the cases (i)~(iv) occurs decides upon 
which descriptions are valid, (i) says that the state is in global equilibrium, and all four 
descriptions are valid, (ii) that the state is in local, but not in global equilibrium, and only 
the three remaining descriptions are valid, (iii) says that the state is in microlocal, but not 
in local equilibrium, and in particular not in global equilibrium. Only the quantum and the 
microlocal descriptions are valid. Finally, (iv) says that the state is not even in microlocal 
equilibrium, and only the quantum description is valid. 

Thus, assuming that the fundamental limitations in obscrability are correctly treated on 
the quantum level, the entropy is an objective quantity, independent of the level of accuracy 
with which we are able to describe the system. The precise value it gets in a model depends, 
however, on the model used and its accuracy. The observation (by Grad [98], Balian [15], 
and others) that entropy may depend significantly on the description level is explained by 
two facts which hold for variables in models of any kind, not just for the entropy, namely 
(i) that if two models disagree in their observable predictions, at most one one of them 
can be correct, and (ii) that if two models agree in their observable predictions, the more 
detailed model has unobservable details. Since unobservable details cannot be put to an 
experimental test, the more detailed model in case (ii) is questionable unless dictated by 
fundamental considerations, such as symmetry or formal simplicity. 



7.3 Statistics and probability 

Recall from Section 5.4 that a quantity g is considered to be significant if its resolution 
res{g) satisfies res{g) <^ 1, while it is considered as noise if res(5') 3> 1. If (7 is a quantity 
and g is a. good approximation of its value then Ag :— g — g is noise. Sufficiently significant 
quantities can be treated as deterministic; the analysis of noise is the subject of statistics. 

Statistics is based on the idea of obtaining information about a noisy system by repeated 
sampling from a population^ of independent systems with identical preparation, but 

^Physicists usually speak of an ensemble in place of a population; but since in connection with the 




(in) S'qu Sral < < Sgi, 
(iv) < Srnl < Sie < Sgi, 
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differing in noisy details not controllable by the preparation. In the present context, such 
systems are described by the same Euclidean *-algebra Eq, the same set of quantities to be 
sampled, and the same state (■)o. 

More precisely, the systems may be regarded as subsystems of a bigger system (e.g., the 
laboratory) whose set of quantities is given by a big Euclidean *-algebra E. To model iden- 
tically prepared subsystems we consider injective homomorphisms from Eq into E mapping 
each reference quantity / e Eq to the quantity /; e E of the Ith subsystem considered to 
be 'identical' with /. Of course, in terms of the big system, the /; are not really identical; 
they refer to quantities distinguished by position and/or time. That the subsystems are 
identically prepeired is instead modelled by the assumption 

{fi)-{f)o forall/eEo, (7.6) 

and that they are independent by the assumption 

{fk9i) = ifk) (gi) for all /, ^ e Eo and A; ^ Z . (7.7) 

The following result is fundamental for statistical considerations: 

7.3.1 Theorem. (Weak law of large numbers) 

For a family of quantities fi {I — 1, . . . , N) satisfying (7.6) and (7.7), the mean value 

^ 1 ^ 
1=1 

(which again is a quantity) satisfies 

if) = (/)o, 

^(/) = <^(/)/ViV, (7.8) 



Proof. We have 



and 



Now 



/•/ = ]^(E/0*(5:/'=) = ^-'E/;a- 

j k j,k 



(/;/.) = (/.)*(/.) +^(/.r = lA^^+^^ 



and by (7.7) for j ^ k, 

ifUk + fkfj) = 2Re(/;/fc) = 2Re(/,r(/fc) = 2Re/^> = 2|/x| 



microcanonical, canonical, or grand canonical ensemble we use the term ensemble synonymous with state, 
we prefer the statistical term population to keep the discussion unambiguous. 
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H6nc6 

^(/)^ = (/*/)-(/)*(/) 
and the assertions follow. □ 



As a significant body of work in probability theory shows, the conditions under which 
a{f) as A'^ — > oo can be significantly relaxed; thus in practice, it is sufficient if (7.6) 
and (7.7) are approximately valid. 

The significance of the weak law of large numbers lies in the fact that (7.8) becomes ar- 
bitrarily small as N becomes sufficiently large. Thus the uncertainty of quantities when 
averaged over a large population of identically prepared systems becomes arbitrarily small 
while the mean value reproduces the value of each quantity. Thus quantities averaged over 
a large population of identically prepared systems become highhly significant when their 
value is nonzero, even when no single quantity is significant. 

This explains the success of statistical mechanics to provide an effectively deterministic 
description of ideal gases, where all particles may be assumed to be independent and iden- 
tically prepared. In real, nonideal gases, the independence assumption is only approximately 
valid because of possible interactions, and in liquids, the independence is completely lost. 
The power of the abstract theory discussed in the preceding chapters lies in the fact that it 
allows to replace simple statistical reasoning based on independence by more sophisticated 
algebraic techniques that give answers even in extremely complex interacting cases. 

The weak law of large numbers also implies that, in a context where many repeated exper- 
iments are feasible, states can be given a frequentist interpretation, in which (g) is the 
expectation of g, empirically defined as an average over many realizations. In this case 
(and only in this case), Tes{g) becomes the standard deviation of g, divided by the abso- 
lute value of the expectation; therefore, it measures the relative accuracy of the individual 
realizations. 

On the other hand, in equilibrium thermodynamics, where a tiny number of macroscopic 
observations on a single system completely determine its state to engineering accuracy, 
such a frequentist interpretation is inappropriate. Indeed, as discussed by Sklar [228], 
a frequentist interpretation of statistical mechanics has significant foundational problems, 
already in the framework of classical physics. 

Thus, the present framework captures correctly the experimental practice, and determines 
the precise conditions under which deterministic and statistical reasoning are justified: 

Deterministic reasoning is sufficient for all quantities whose limit resolution is below the de- 
sired relative accuracy. Statistical reasoning is necessary precisely when the limit resolution 
of the average quantity is larger than the desired accuracy. 

In this way, we delegate statistics to its role as the art of interpreting measurements, as 
in classical physics. Indeed, to have a consistent interpretation, real experiments must 
be designed such that they allow one to determine approximately the properties of the 
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state under study, hence the values of all quantities of interest. The uncertainties in the 
experiments imply approximations, which, if treated probabilistically, need an additional 
probabilistic layer accounting for measurement errors. Expectations from this secondary 
layer, which involve probabilistic statements about situations that are uncertain due to 
neglected but in principle observable details (cf. Peres [194]), happen to have the same 
formal properties as the values on the primary layer, though their physical origin and 
meaning is completely different. 



Classical probability. Apart from the traditional axiomatic foundation of probability 
theory by KOLMOGOROV [139] in terms of measure theory there is a less well-known ax- 
iomatic treatment by Whittle [259] in terms of expectations, which is essentially the 
commutative case of the present setting. The exposition in Whittle [259] (or, in more 
abstract terms, already Gelfand & N aim ark [89]) shows that, if the Xj are pairwise 
commuting, it is possible to define for any Gibbs state in the present sense, random vari- 
ables Xj in Kolmogorov's sense such that the expectation of all sufficiently regular functions 
f{X) defined on the joint spectrum of (X) agrees with the value of /. It follows that in 
the pairwise commuting case, it is always possible to construct a probability interpretation 
for the quantities, completely independent of any assumed microscopic reality. 

The details (which the reader unfamiliar with measure theory may simply skip) are as 
follows. We may associate with every vector X of quantities with commuting components 
a time-dependent, monotone hnear functional ()< defining the expectation 



at time t of arbitrary bounded continuous functions / of X. These functions define a com- 
mutative *-algebra E(X). The spectrum SpecX of X is the set of all *-homomorphisms 
(often called chciracters) from E(X) to C, and has the structure of a Hausdorff space, with 
the weeik-* topology obtained by calhng a subset S of SpecX closed if, for any pointwise 

convergent sequence (or net) contained in S, its limit is also in S. Now a monotone linear 
functional turns out to be equivalent to a multivariate probability measure dnt{X) (on the 
sigma algebra of Borel subsets of the spectrum fl of X) defined by 



Conversely, classical probability theory may be discussed in terms of the Euclidean 8-algebra 
of random variables, i.e., Borel measurable complex- valued functions on a Hausdorff 
space Q where bounded continuous functions are strongly integrable and the integral is 
given by Jf ■— J diJt,{X)f{X) for some distinguished measure ji. 

If - as in quantum systems - the extensive quantities do not commute, a probabilistic 
interpretation in the Kolmogorov sense is no longer possible. In Section 7.5, we discuss 
what may take its place. 



(/(X)), !p{t)f{X) 
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7.4 Classical measurements 

A measuring instrument measures properties of a system of interest. However, the measured 
value is read off from the instrument, and hence is primarily a property of the measuring 
instrument and not one of the measured system. On the other hand, properties of the 
system are encoded in the state of the system and its dynamics. This state and what can 
be deduced from it are the only objective properties of the system. 

In order that a measurement on a system deserves its name there must be a quantitative 
relation between the state of the system and the measured values. This relation may be 
deterministic or stochastic, depending on what is measured. 

Measurements are therefore possible only if the microscopic laws imply relations between 
properties of the measured system and the values read off from the measuring instru- 
ment. These relations may be either deduced from a theoretical analysis, or they may be 
guessed from experimental evidence. In general, the theoretical analysis leads to difficult 
many-particle problems that can be solved only in a stochastic approximation of idealized 
situations; from such idealizations one then transfers insight to make educated guesses in 
cases where an analysis is too difficult. 

The behavior required in the following discussion for a classical or a statistical instrument 
guarantees reproducibility of measurements, a basic requirement of natural sciences, in 
the sense that systems prepared in the same state will behave alike when measured. Here 
'alike' is interpreteted for classical instruments in the deterministic sense of 'approximately 
equal within the specified accuracy', and for statistical instruments in the sense of 'repro- 
ducing in the long run approximately the same probabilities and mean values'. 

When measuring classical or quantum systems that are macroscopic, i.e., large enough to 
be described sufficiently well to be described by the methods of statistical mechanics, one 
measures more or less accurately extensive or intensive variables of the system and one ob- 
tains essentially deterministic results. A classical instrument is a measuring instrument 
which measures such deterministic values within some margin of accuracy. Note that this 
gives an operational meaning to the term classical, although every classical instument is, of 
course, a quantum mechanical many-particle system when modelled in full detail. Whether 
a particular instrument behaves classically can in principle be found out by an analysis of 
the measurement process considered as a many-particle system, although the calculations 
can be done in practice only under simplifying assumptions. Fore some concrete models, 
see, e.g., Allahverdyan et al. [6]. Thus there is no split between the classical and the 
quantum world but a gradual change from quantum to classical as the system gets larger 
and the limit resolution improves. 

It is interesting to discover the nature of thermodynamic observables^ . We encountered 
intensive variables, which are parameterss characterizing the state of the system, extensive 
variables, values which are functions of the intensive variables and of the parameters (if 



^We use the term observable with its common-sense meaning. In quantum mechanics, the term has also 
a technical meaning which we do not use, denoting a self-adjoint linear operator on a Hilbert space. 
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there are any) in the Hamiltonian, and hmit resolutions, which, as functions of values, are 
also functions of the intensive variables. Thus all thermodynamic observables of practical 
interest are functions of the parameters defining the thermal state or the hamiltonian of 
the system. Which parameters this are depends of course on the assumed model. 

By a natural step of extrapolation, substantiated later (in Section 13.1) by the Dirac-Frenkel 
variational principle, we may consider, for an arbitrary model of an arbitrary system, as 
the basic numbers whose functions define the observables of the model the parameters 
characterizing a family of Hamiltonians and a family of states which describe the possible 
states of a system. We call these parameters the model pEirameters; the values of the 
model parameters completely characterize a particular system described by the model. 

Thus we may say that a classical instrument is characterized by the fact that upon measure- 
ment the measurement result approximates with a certain accuracy the value of a function 
F of the model parameters. As customary, one writes the result of a measurement as an 
uncertain number Fq ± AF consisting of a main value Fq and a deviation AF, with the 
meaning that the error |Fo — F| is at most a small multiple of AF. Because of possible 
systematic errors, it is generally not possible to interpret Fq as mean value and AF as 
standard deviation. Such an interpretation is valid only if the instrument is calibrated to 
satisfy the implied statistical relation. 

In particular, since (/) is a function of the model parameters, a measurement may yield 
the value (/) of a quantity /, and is then said to be a classical instrument for measuring 
/. As an important special case, all readings from a photographic image or from the scale 
of a measuring instrument, done by an observer, are of this nature when considered as 
measurements of the instrument by the observer. Indeed, what is measured by the eye 
is the particle density of blackened silver on a photographic plate or of iron of the tip of 
the pointer on the scale, and these are extensive variables in a continuum mechanical local 
equilibrium description of the instrument. 

The measurement of a tiny, microscopic system, often consisting of only a single particle, 
is of a completely different nature. Now the limit resolutions do not benefit from the law 
of large numbers, and the relevant quantities often are no longer significant. Then the 
necessary quantitative relations between properties of the measured system and the values 
read off from the measuring instrument are only visible as stochastic correlations. In a 
single measurement of a microscopic system, one can only glean very little information 
about the state of a system; conversely, from the state of the system one can predict only 
probabilities for the results of a single measurement. The results of single measurements 
are no longer reproducably observable numbers; reproducably observable - and hence the 
carriar of scientific information - are only probabilities and statistical mean values. 

To obtain comprehensive information about the state of a single microscopic system is 
therefore impossible. To collect enough information about the prepared state and hence 
the state of each system measured, one needs either time-resolved measurements on a 
single system (available, e.g., for atoms in ion traps or for electrons in quantum dots), or a 
population of identically prepared systems. 
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Extrapolating from the macroscopic case, it is natural to consider again the parameters 
characterizing a family of states which describe the possible states of a system as the basic 
numbers whose functions define observables in the present, nontechnical sense. This is now 
a less well-founded assumption based only on the lack of a definite boundary between the 
macroscopic and the microscopic regime, and an application of Ockham's razor to minimize 
the needed assumptions. 

Measurements in the form of clicks, fiashes or events (particle tracks) in scattering experi- 
ments may be described in terms of a statistical instrument characterized by a discrete 
family of possible measurement results ai, 02, . . . which may be real or complex numbers, 
vectors, or fields, and nonnegative Hermitan quantities Pi, P2, ■ ■ ■ satsifying 

Pi + P2 + • • ■ = 1 (7.9) 

such that the instrument gives the result Uk with probability 

Pk = {Pk) (7.10) 

if the measured system is in the state (•). The nonnegativity of the P^ implies that all 
probabilities are nonnegative, and (7.9) guarantees that the probabilities always add up to 
1. 



An instructive example is the photoelectric effect, the measurement of a classical free 
electromagnetic field by means of a photomultiplier. A detailed discussion is given in 
Chapter 9 of Mandel & WOLF [161]; here we only give an informal summary of their 
account. 

Classical input to a quantum system is conventionally represented in the Hamiltonian of 
the quantum system by an interaction term containing the classical source as an external 
field or potential. In the semiclassical analysis of the photoelectric effect, the detector is 
modelled as a many-electron quantum system while the incident light triggering the detector 
is modelled as an external electromagnetic field. The result of the analysis is that if the 
classical field consists of electromagnetic waves (light) with a frequency exceeding some 
threshold then the detector emits a random stream of photoelectrons with a rate which, for 
not too strong light, is proportional to the intensity of the incindent light. The predictions 
are quantitiatively correct for normal light. 

The response of the detector to the fight is statistical, and only the rate (a short time mean) 
with which the electron are emitted bears a quantitiative relation with the intensity. Thus 
the emitted photoelectrons form a statistical measurement of the intensity of the incident 
fight. 

The results on this analysis are somewhat surprising: The discrete nature of the electron 
emissions imply that a photodetector responds to classica light as if it were composed of 
randomly arriving photons (the explanation of the photoeffect for which Einstein received 
the Nobel prize), although the semiclassical model used to derive the quantitiatively correct 
predictions does not involve photons at all! 
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This shows the importance of differentiating between prepared states of the system (here 
of classical light) and measured events in the instrument (here the amplified emitted elec- 
trons). The measurement results are primarily a property of the instrument, and their 
interpretation as a property of the system needs theoretical analysis to be conclusive. 



7.5 Quantum probability 

Although quantum mechanics generally counts as an intrinsically statistical theory, it is 
important to realize that it not only makes assertions about probabilities but also makes 
many deterministic predictions verifiable by experiment. 

These deterministic predictions fall into three classes: 

(i) Predictions of numerical values believed to have a precise value in nature: 

• The most impressive proof of the correctness of quantum field theory in microphysics 
is the magnetic moment of the electron, predicted by quantum electrodynamics 
(QED) to the phenomenal accuracy of 12 significant digit agreement with the exper- 
imental value. It is a universal constant, determined solely by the two parameters in 
QED, the electron mass and the fine structure constant. 

• QED also predicts correctly emission and absorption spectra of atoms and molecules, 
both the spectral positions and the corresponding line widths. 

• Quantum hadrodynamics allows the prediction of the masses of all isotopes of the 
chemical elements in terms of models with only a limited number of parameters. 

(ii) Predictions of qualitative properties, or of numerical values believed to be not exactly 
determined but which are accurate with a high, computable limit resolution. 

• QED predicts correctly the color of gold, the liquidity of mercury at room tempera- 
ture, and the hardness of diamond. 

• Quantum mechanics enables the computation of thermodynamic state equations for 
a huge number of materials. Equations of states are used in engineering in a deter- 
ministic manner. 

• Prom quantum mechanics one may also compute transport coefficients for determin- 
istic kinetic equations unsed in a variety of applications. 

Thus quantum mechanics makes both deterministic and stochastic assertions, depending 
on which system it is applied to and the state or the variables to be determined. Statis- 
tical mechanics, as discussed in Chapters 5 and 6, is mainly concerned with deterministic 
prediction of class (ii) in the above classification. 



7.5. QUANTUM PROBABILITY 



151 



Interestingly, our definition of classical instruments also covers joint position- momentum 
measurements of coherent states, the quantum states discussed in 14.7. They are param- 
eterized by position and momentum, and describe single quantum particles with essentially 
classically trajectories, such as they can be seen as particle tracks on photographic plates 
or in bubble chambers. The deterministic nature of the recorded tracks is due to the inter- 
action of such a particle with the many-particle system formed by the recording device. 

Predictions of class (i) are partly related to spectral properties of the Hamiltonian of a 
quantum system, which we shall discuss in Chapter 17, and partly properties deduced from 
form factors, which are deterministic byproducts of scattering calculations. In both cases, 
classical measurements account adequately for the experimental record. 

Particle scattering itself, however, is a typical stochastic phenomenon. The same holds for 
radioactive decay, when modelled on the level of individual particles; it needs a stochastic 
description as a branching process, similar to clsssical birth and death processes in biological 
population dynamics. In the remainder of this section, we consider the fundamental aspects 
of this stochastic part of quantum mechanics. 

A statistical instrument in the quantum case is mathematically equivalent to what is 
called in the literature a positive operator-valued measure, short POVM, defined 
as a family Pi,P2,... of Hermitan, positive semidefinite operators satsifying (7.9) (or a 
continuous generalization of this). They originated around 1975 in work by Helstrom 
[112] on quantum detection and estimation theory and are discussed in some detail in 
Peres [194]. They describe the most general quantum measurement of interest in quantum 
information theory. Which operators P^ correctly describe a statistical instrument can in 
principle be found out by suitable calibration measurements. Indeed, if we feed the 
instrument with enough systems prepared in known states we can measure approximate 
probabilities pjk ~ (Pfc)j- By choosing the states diverse enough, one may approximately 
reconstruct P^ from this information by a process called quantum tomography. In 
quantum information theory, the Hilbert spaces are finite-dimensional, hence the quantities 
form some algebra E = C^^^; then N'^ values {Pk)j for hnearly independent states suffice 
for this reconstruction. The optimal reconstruction using a minimal number of individual 
measurements is the subject of quantum estimation theory, still an active frontier of 
research. 

Before 1975, quantum measurements used to be described in terms of ideal statistical 
measurements, the special case of POVMs where the Pk form a family of orthogonal 
projectors, i.e., linear operators satsisfying 

Pl^Pk^Pl PjPk^O forjVA;, 

on the eigenspaces of a selfadjoint operator A (or the components of a vector A of com- 
muting, self-adjoint operators) with discrete spectrum given by ai, 02, . . .. In this case, the 
statistical instrument is said to perform ideal measurements of A, and the rule (7.10) 
defining the probabilities is called Born's rule. The rule is named after Max Born who 
derived it in 1926 in the special case of pure states (defined in (6.59)) and was rewarded 
for this at that time crucial insight into the nature of quantum mechanics with the Nobel 
prize in 1954. 
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Ideal measurements of A have quite strong properties since under the stated assumptions, 
the instrument-based statistical average 

7P)=Pi/(«i)+P2/(a2) + ... 

agrees for all functions / defined on the spectrum of A with the model-based value {f{A)). 
On the other hand, these strong properties are bought at the price of idealization, since it 
results in effects incompatible with real measurements. For example, according to Horn's 
rule, the ideal measurement of the energy of a system whose Hamiltonian H is discrete al- 
ways yields an exact eigenvalue of H, the only statistical component is the question which 
of the eigenvalues is obtained. This is impossible in a real measurement; the precise mea- 
surement of the Lamb shift, a difference of eigenvalues of the Hamitonian of the hydrogen 
atom, was even worth a Nobel prize (1955 for Willis Lamb). 

There is, however, an important special case of Born's rule which frequently applies essen- 
tially exactly. An ideal binary statistical measurement, e.g., the click of a detector, is 
described by a single orthogonal projector P; the POVM is then given by Pi = P for the 
measurement result ai — 1 (click) and by P2 = 1 — P for the measurement result a2 — 
(no click). In particular, a test for a state (/? with = 1 is an ideal binary statistical 
measurement with orthogonal projector P = ^p^p*; the reader should check that indeed 
P^ = P = P*. By the above, such a test turns out positive with probability p = {(p(p*)- In 
particular, if the system is in a pure state ip then (6.59) implies that 

p — {(p(p*) — ip*ifif*ip — \if*ip\'^. 

This is the well-known squared probability amplitude formula, the original form of 

Born's rule. As a consequence, the test for always turns out positively if the measured 
system is in the pure state (f. However, it also turns out positively with a positive probability 
if the measured system is in a pure state different from ip, as long as it is not orthogonal to 
it. 

By a suitable sequence of ideal binary measurements, it is possible in principle to determine 
with arbitrary accuracy the state in which a stationary source of particles prepares the 
particles. Indeed, this can be done again with quantum tomography. In case of A^-level 
systems represented by E = C^^^, a general state is characterized by its density matrix 
p, a complex Hermitian N x A'^-matrix with trace one, together with the trace formula 

(/)=trp/. 

This implies that a set of A^^ — 1 tests for specific states, repeated often enough, suffices 
for the state determination. Indeed, it is easy to see that repeated tests for the states , 
the unit vectors with just one entry one and other entries zero, tests the diagonal elements 
of the density matrix, and since the trace is one, one of these diagonal elements can be 
computed from the knowledge of all others. Tests for + and -\- ie^ for all j < k 
then allow the determination of the {j,k) and {k,j) entries. Thus repetition of a total of 
A^ — 1 + 2(^) = A^^ — 1 tests deternimes the full state. The optimal reconstruction using 
a minimal number of individual measurements is again a nontrivial problem of quantum 
estimation theory. 
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7.6 Entropy and unobservable complexity 



The concept of entropy also plays an important role in information theory. To connect 
the information theoretical notion of entropy with the present setting, we present in this 
section an informal example of a simple stochastic model in which the entropy has a natural 
information theoretical interpretation. We then discuss what this may teach us about a 
non-stochastic macroscopic view of the situation. 

We assume that we have a simple stationary device which, in regular intervals, delivers a 
reading n from a countable set J\f of possible readings. For example, the device might count 
the number of events of a certain kind in fixed periods of time; then Af — {0, 1,2,.. .}. 

We suppose that, by observing the device in action for some time, we are led to some 
conjecture about the (expected) relative frequencies Pn of readings n e Af; since the device 
is stationary, these relative frequencies are independent of time. If Af is finite and not too 
large, we might take averages and wait until these stabilize to a satisfactory degree; if A/" is 
large or infinite, most n & J\f will not have been observed, and our conjecture must depend 
on educated guesses. 

Clearly, in order to have a consistent interpretation of the p„ as relative frequencies, we 
need to assume that each reading is possible: 



For reasons of economy, we shall not allow p„ = in (7.11), which would correspond to 
readings that are either impossible, or occur too rarely to have a scientific meaning. Clearly, 
this is no loss of generality. 

Knowing relative frequencies only means that (when A/" > 1) we only have incomplete 
information about future readings of the device. We want to calculate the information 
deficit by counting the expected number of questions needed to identify a particular reading 
unknown to us, but known to someone else who may answer our questions with yes or no. 

Consider an arbitrary strategy s for asking questions, and denote by Sn the number of 
questions needed to determine the reading n. With q questions we can distinguish up to 
2^ different cases; but since reading n is already determined after s„ questions, reading n 
is obtained in 2*~*" of the 2^ cases (when Sn < q)- Thus 



Pn> for all neU, 



(7.11) 



and some reading occurs with certainty: 




(7.12) 



neA/" 




Sn<q 



If we divide by 2* and then make q arbitrarily large we find that 




(7.13) 



neA/" 
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It is not difficult to construct strategies reafizing the s„ whenever (7.13) holds. 

Since we do not know in advance the reading, we cannot determine the precise number 
of questions needed in a particular unknown case. However, knowledge of the relative 
frequencies allows us to compute the average number of questions needed, namely 

S = '^PnSn- (7-14) 
To simplify notation, we introduce the abbreviation 

for every quantity / indexed by the elements from A/", and we use the convention that 
inequalities, operations and functions of such quantities are understood componentwise. 
Then we can rewrite (7.11)-(7.14) as 

p>0, Jp = l, (7.16) 

s^Jps, /2-^<l, (7.17) 

and 

l-{f)---Ipf- (7.18) 
is the average of an arbitrary quantity / indexed by J\f. 



We now ask for a strategy which makes the number s as small as possible. However, we 
idealize the situation a little by allowing the s„ to be arbitrary nonnegative real numbers 
instead of integers only. This is justified when the size of J\f is large or infinite since then 
most Sn will be large numbers which can be approximated by integers with a tiny relative 
error. 

7.6.1 Theorem. The entropy S, defined by 

S :— —klogp, where k — , ^ , (7.19) 

log 2 

satisfies S <s, with equahty if and only if s — S. 



Proof. (7.19) implies logp = —5" log 2, hence p — 2 ^. Therefore 

2-' = p2^-' = pe(^-")'°s2 > p(^i + {S-s) log2), 

with equality iS S = s. Thus 

PiS - s) < :^{2-^ - p) ^ k{2-^ - p) 

and 

S-s = Jp{S - s) < Jk{2-' - p) 

= k J2-' -njp<n-ic^o. 
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Hence s > S, and equality holds iS s — S. O 

Since (7.19) implies the relation p — e~^/^, we have (/) = Jpf — Je"^/^/. Thus, the 
expectation mapping is a Gibbs state with entropy S, explaining the name. Note that 
s — S defines an admissible strategy since 

neAf 

hence 2~^" < 1, -S'n > for all n E N". Thus, the entropy S is the unique optimal 
decision strategy. The expected entropy, i.e., the mean number 

S={S)= JpS = -h Jp log p (7.20) 

of questions needed in an optimal decision strategy, is nonnegative, 

S>0. (7.21) 

It measures the information deficit of the device with respect to our conjecture about 
relative frequencies. Traditionally, the expected entropy is simply called the entropy, while 
we reserve this word for the random variable (7.19). Also commonly used is the name 
information for S, which invites linguistic paradoxes since ordinary language associates with 
information a connotation of relevance or quality which is absent here. The classical book 
on information theory by Brillouin [46] emphasizes this very carefully, by distinguishing 
absolute information from its human value or meaning. Katz [132] uses the phrase missing 
information. 

The information deficit says nothing at all about the quality of the information contained 
in the summary p of our past observations. An inappropriate p can have arbitrarily small 
information deficit and still give a false account of reality. For example, if for some small 
£ > 0, 

Pn = e''-\l-e) forn = l,2,..., (7.22) 

expressing that the reading is expected to be nearly always 1 {pi — 1 — s) and hardly ever 
large, then 

S = n(^log{l - e) + Y^^oge^ ^0 as£^0. 

Thus the information deficit can be made very small by the choice (7.22) with small e, 
independent of whether this choice corresponds to the known facts. The real information 
value of p depends instead on the care with which the past observations were interpreted, 
which is a matter of data analysis and not of our model of the device. If the data analysis is 
done poorly, the resulting expectations will simply not be matched by reality. This shows 
that the entropy reflects objective properties of the stochastic process, and - contrary to 
claims in the literature - has nothing to do with our knowledge of the system, a subjective, 
ill-defined notion. 



Relations to thermodynamics. Now suppose that the above setting happens at a very 
fast, unobservable time scale, so that we can actually observe only short time averages 
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(7.18) of quantities of interest. Then / = (/) simply has the interpretation of the time- 
independent observed vahie of the quantity /. The information deficit simply becomes 
the observed value of the entropy S. Since the information deficit counts the number 
of optimal decisions needed to completely specify a (microscopic) situation of which we 
know only (macroscopic) observed values, the observed value of the entropy quantifies the 
intrinsic (microscopic) complexity present in the system. 

However, the unobservable high frequency fluctuations of the device do not completely 
disappear from the picture. They show up in the fact that generally ^ g^, leading to 
a nonzero limit resolution (5.44) of Hermitian quantities. This is precisely the situation 
characteristic of the traditional treatment of thermodynamics within classical equilibrium 
statistical mechanics, if we assume that the system is ergodic, i.e., that population av- 
erages equal time averages. Then, all observed values are time- independent, described 
by equilibrium thermal variables. But the underlying high-frequency motions of the atoms 
making up a macroscopic substance are revealed by nonzero limit resolutions. However, the 
assumption that all systems for which thermodynamics works are ergodic is problematic; 
see, e.g., the discussion in Sklar [228]. 

Note that even a deterministic but chaotic high frequency dynamics, viewed at longer time 
scales, looks stochastic, and exactly the same remarks about the unobservable complexity 
and the observable consequences of fluctuations apply. Even if fluctuations are observable 
directly, these observations are intrinsically limited by the necessary crudity of any actual 
measurement protocol. For the best possible measurements (and only for these), the resolu- 
tion of / in the experiment is given by the limit resolution res(/), the size of the unavoidable 
fluctuations. 

Due to the quantum structure of high frequency phenomena on an atomic or subatomic 
scale, it seems problematic to interpret thermodynamic limit resolutions in terms of a simple 
short time average of some underlying microscopic reality. Thus an information theoretic 
interpretation of the physical entropy seems questionable. 



7.7 Subjective probability 

The formalism of statistical mechanics is closely related to that used in statistics for random 
phenomena expressible in terms of exponential families; cf. Remark 6.1.2(viii). Exponential 
families play an importan t role in Bayesian statistics. Therefore a Bayesian, subjective 
probability interpretation to statistical mechanics is possible in terms of the knowledge of 
an observer, using an information theoretic approach. See, e.g., Balian [16] for a recent 
exposition in terms of physics, and Barndorff-Nielsen [22, 23] for a formal mathematical 
treatment. In such a treatment, the present integral plays the role of a noninformative 
prior, i.e., of the state considered to be least informative. This noninformative prior is 
often improper, i.e., not a probability distribution, since /I need not deflned. 

Motivated by the subjective, information theoretic approach to probability, Jaynes [122, 
123] used the maximum entropy principle to derive the thermodynamic formalism. The 
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maximum entropy principle asserts that on should model a system with the statistical 
distribution which maximizes the expected entropy subject to the known information about 
certain expectation values. This principle is sometimes considered as a rational, unpreju- 
diced way of accounting for available information in incompletely known statistical models, 
Based on Theorem 5.3.3, it is not difficult to show that when the known information is 
given by the expectations of the quantities Xi, . . . the optimal state in the sense of 
the maximum entropy principle is a Gibbs state whose entropy is a linear combination of 1 
and the Xk- 

However, the maximum entropy principle is an unreliable general purpose tool, and gives 
an appropriate distribution only under quite specific circumstances. 

7.7.1 Example. If we have information in the form of a large but finite sample of real- 
izations x{u!k) of a random variable x in N independent experiments cuk {k — 1, . . . ,n), we 
can obtain from it approximate information about all moments (x") ~ N~^^x(uJk)^ /N 
{n = 0, 1, 2, . . .). It is not difficult to see that the maximum entropy principle would infer 
that the distribution of x is discrete, namely that of the sample. 

If we take as uninformative prior for a real-valued random variable x the Lebesgue measure, 
Jf{x) := J f{x)dx, and only know that the mean of x is 1, say, the maximum entropy 
principle does not produce a sensible probability distribution. If we add the knowledge of 
the second moment {x"^) = 1, say, we get a Gaussian distribution with mean 1 and standard 
deviation 1/V2. Adding the further knowledge of (x^), the maximum entropy principle 
fails again to produce a sensible distribution. If, on the other hand, after knowing that 
{x) = 1 we learn that the random variable is in fact nonnegative and integer-valued, this 
cannot be accounted for by the principle, and the probability of obtaining a negative value 
remains large. But if we take as prior the discrete measure on nonnegative integers defined 
by Jf{x) :— Yl^=of{^)/^^--i the supposedly noninformative prior has become much more 
informative, the knowledge of the mean produces via the maximum entropy principle a 
Poisson distribution. 

If we know that a random variable x is nonnegative and has (x^) = 1; the Lebesgue measure 
on R+ as noninformative prior gives for x a distribution with density •\/2/7re~^ But we 
can consider instead our knowledge about y = x^, which is nonnegative and has {y) = 1; the 
same noninformative prior now gives for y a distribution with density . The distribution 
oi X — y/y resulting from this has density 2xe~^ Thus the result depends on whether 
we regard x or y as the relevant variable. 

We see that the choice of expectations to be used as constraints refiects prior assumptions 
about which expectations are likely to be relevant. Moreover, the prior, far from being 
uninformative, reflects the prejudice assumed in the complete absence of knowledge. The 
prior which must be assumed to describe the state of complete ignorance significantly 
affects the results of the maximum entropy principle, and hence makes the application of 
the principle ambiguous. 

The application of the maximum entropy principle becomes reliable only if the information 
is available in form of the expectation values of a sufficient statistics of the true model; 
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sec, e.g., Barndorff-Nielsen [22]. Which statistical model may be considered sufficient 
depends on the true situation and is difficult to assess in advance. 

In particular, a Bayesian interpretation of statistical mechanics in the manner of Jaynes is 
aprropriate if and only if (i) correct, complete and sufficiently accurate information about 
the expectation of all relevant quantities is assumed to be known, and (ii) the noninformative 
prior is fixed by the constructions of Example 5.1.8, namely as the correctly weighted 
Liouville measure in classical physics and as the microcanonical ensemble (the trace) in 
quantum physics. Only this guarantees that the knowledge assumed and hence the results 
obtained are completely impersonal and objective, as required for scientfic results, and 
agree with standard thermodynamics, as required for agreement with nature. However, 
this kind of knowledge is clearly completely hypothetical and has nothing to do with the 
real, subjective knowledge of real observers. 



Part III 

Lie algebras and Poisson algebras 
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Chapter 8 
Lie algebras 



Part III introduces the basics about Lie algebras and Lie groups, with an emphasis on the 
concepts most relevant to the conceptual side of physics. 



This chapter introduces Lie algebras together with the slightly richer structure of a Lie *- 
algebra usually encountered in the mechanical applications. We introduce tools for verifying 
the Jacobi identity, and establish the latter both for the Poisson bracket of a classical 
harmonic oscillator and, for quantum systems, for the commutator of linear operators. 

Further Lie algebras arise as algebras of matrices closed under commutation, as algebras 
of derivations in associative algebras, as centralizers or quotient algebras, and by complex- 
ification. An overview over semisimple Lie algebras and their classification concludes the 
chapter. 



In finite dimensions, the relation is almost one-to-one, the "almost" being due to the fact 
that the so-called universal covering group of a finite-dimensional Lie algebra (defined in 
Section 10.4) may have a nontrivial discrete normal subgroup. 

Many finite-dimensional Lie groups arise as groups of square invertible matrices, and we 
discuss the most important families, in particular the unitary and the orthogonal groups. 
We introduce group representations, which relate groups of matrices (or linear operators) to 
abstract Lie groups, and will turn out to be most important for understanding the spectrum 
of quantum systems. 

Of particular importance for systems of oscillators are the Heisenberg groups, the universal 
covering groups of the Heisenberg algebras. Their product law is given by the famous 
Weyl relations, which are an exactly representable case of the Baker-Campbell-Hausdorff 
formula valid for many other Lie groups, in particular for arbitrary finite-dimensional ones. 
We also discuss the Poincare group. This is the symmetry group of space-time, and forms 
the basis for relativity theory. 
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8.1 Basic definitions 

We start with the definition of a Lie algebra over a field K, usually implicitly given by the 
context. In our course, IK is either the field C of complex numbers, occasionally the field 
M of real numbers. Lie algebras over other fields, such as the rationals Q or finite fields Zp 
for p prime, also have interesting applications in mathematics, physics and engineering, but 
these are outside the scope of this book. To denote the Lie product, we use the symbol "i 
introduced at the end of Section 1.2. (This replaces other, bracket-based notations common 
in the literature.) 

8.1.1 Definition. 

(i) A Lie product on a vector space L over K is a bihnear operation on L satisfying 

(LI) / n/ = 0, 

(L2) / n (^ n /i) + 5 n (/i n /) + /i n (/ n ^) = for all /, 5, /i e L. 
Equation (L2) is called the Jacobi identity. 

(ii) For subsets A, B of L, we write 

A^B:^ {fng\feA,geB}, 

and for f,g Eh, 

Ang:^A^{g}, f^B:^{f}^B. 

(iii) A Lie algebra over K is a vector space L over K with a distinguished Lie product. 

Elements / G L with / n L = are called (Lie) central; the set Z{h) of all these elements 
is called the center of L. A real (complex) Lie algebra is a Lie algebra over K = IR (resp. 
K = C). Unless confusion is possible, we use the same symbol ~i for the Lie product in 
different Lie algebras. 

Clearly, ii f ~\ g defines a Lie product of / and g, so does f ~'^g :— i{f "i g) for all t e K. 
Thus the same vector space may be a Lie algebra in different ways. 

In physics, finite-dimensional Lie algebras are often defined in terms of basis elements 
called generators and structure constants cjki, such that 

Xj^Xk^Y.'^JkiXi. (8.1) 

I 

By taking linear combinations and using the bilinearity of the Lie product, the structure 
constants determine the Lie product completely. Conversely, since the generators form a 
basis, the structure constants are determined uniquely by the basis. They depend, however, 
on the basis chosen. Frequently, there are distinguished bases with a physical interpretation 
in which the structure constants are particularly simple, and most of them vanish. If a 
basis and the structure constants are given, man Lie algebra computations can be done 
automatically; important software packages include LIE (van Leeuwen et al. [251]) and 
LTP (TORRES-TORRITI [242]). In this book, we usually prefer a basis- free approach. 
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resorting to basis-dependent formulas only to make connections with traditional physics 
notation. 

As a consequence of (LI) (and in fact equivalent to it), we have the following antisymmetry 
property: 

This follows from observing that / ~iO = 0"i/ = and 

= {f + g)^{f + g)-f^f + f^g + g^f + g^g 
= f^g + g^f- 

Using the antisymmetry property of the Lie product one can write the Jacobi identity in 
two other important forms, each equivalent with the Jacobi identity: 

f^{9^h) = {fng)nh + gn(fnh), (8.2) 

{f^g)^h={f^h)^g + f^ignh). (8.3) 

These formulas say that one can apply the Lie product to a compound expression in a 
manner familiar from the product rule for differentiation. 

An important but somewhat trivial class of Lie algebras are the abelian Lie algebras, where 
/ "1 (7 = for all /, G L. It is trivial to check that (LI) and (L2) are satisfied. Clearly, 
every vector space can be turned into an abelian Lie algebra by defining f ~\ g = for all 
vectors / and g. In particular, the field K itself and the center of any Lie algebra are abelian 
Lie algebras. 

A subspace L' of a Lie algebra L is a Lie subalgebra if it is closed under the Lie product, 
i.e., if / "1 gr e L' for all f,g& L'. In this case, the restriction of the Lie product n of 
L to L' turns L' into a Lie algebra. That is, a Lie subalgebra is a subspace that is a Lie 
algebra with the same Lie product. (For example, the subspace K/ spanned by an arbitrary 
element / of a Lie algebra is an abelian Lie subalgebra.) A Lie subalgebra is nontrivial if 
it is not the whole Lie algebra and contains a nonzero element. 

The property (LI) is usually easy to check. It is harder to check the Jacobi identity (L2) 
for a proposed Lie product; direct calculations can be quite messy when many terms have 
to be calculated before one finds that they all cancel. Since we will encounter many Lie 
products that must be verified to satisfy the Jacobi identity, we first develop some technical 
machinery to make life easier, or at least more structured. For a given binary bilinear 
operation o on L, we define the associator oi f,g,h eL, as 

[Lg,h]:={fog)oh-fo{goh). (8.4) 

8.1.2 Proposition. If the associator of a bilinear operator o 012 L satisfies 

[/, g, h] + [g, h, f] + [h, /, g] - [/, h, g] - [h, g, /] - [g, /, /i] = , (8.5) 

then 

f^g-^f°g-g°f 

defines a Lie product on L. 
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Proof. Define 

J{f,9,h) ■.^f^{g^h)+gn{hnf) + hn{fng), 

and define 

Sif, g, h) := [/, g, h] + b, h, /] + [h, f, g] - [/, h, g] - [h, g, f] - [g, f, h] . 

Writing out S{f,g,h) and Jif,g,h) with f ^ g := f o g - g o f, one sees J{f,g,h) = 
—S{f, g, h) and hence if S{f, g,h) = for all /, g and h, then the Jacobi identity is satisfied 
for all /, g and h. The antisymmetry property / ~i / = is trivial. □ 

8.1.3 Theorem. The binary operation "i defined on the vector space C°°{M. x M) by 

f^g-= fpgq - gpfq , 

where fp — df /dp and fg = df /dq, is a Lie product. 

Proof. We calculate the associator for the bilinear operator f o g = fpgq. We have 
[f,g,h] = if o g)phq - fp{g o h)g 

~ {fpgg)phq ~ fp{gphq)q 

fppgq^q fpgqp^q fpgpq^q fpgp^qq 
~ fppgqhq ~ fpgphqq 

Writing the cyclic permutations we get 

[/, g, h] + [g, h, /] + [h, f, g] = fppgqhq + gpphqfq + hppfqgq 

fpgphqq gphpfqq hpfpgqq ) 

which is symmetric in f,g; hence the identity (8.5) is satisfied. Proposition 8.1.2 therefore 
implies that "i is a Lie product. 

The reader is invited to prove this result also by a direct calculation. □ 

We end this section by introducing some concepts needed at various later points but col- 
lected here for convenience. If L and L' are Lie algebras we call a linear map : L — > L' a 
homomorphism (of Lie algebras) if 

0(/ ^ g) = 0(/) ^ M 

for all f^geL,. Note that the left-hand side involves the Lie product in L, whereas the 
right-hand side involves the Lie product in L'. An injective homomorphism is called an 
embedding of L into L'. We call two Lie algebras L and L' isomorphic if there is a 
homomorphism : L ^ L' and a homomorphism ip -.h' ^ h such that o is the identity 
on L and (poijj is the identity on L'. Then is called an isomorphism, and ijj is the inverse 
isomorphism. 
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Given a Lie algebra L and a subalgebra L', the centralizer Cl'(5') in L' of a subset S cL, 
is defined by 

Cv{S) = {feh'\fng = for all (?G 5}. 

In words, Ci^>{S) consists of all the elements in L' that Lie commute with all elements in 
S. One may use the Jacobi identity to see that Cl'(5') is a Lie subalgebra of L'. 

An ideal of L is a subspace J C L such that f ^ g E I for all / G L and for all g E I. In 
other notation L~i/ = /nLC/. Note that and L itself are always ideals; they are called 
the trivial ideals. Also, the center of a Lie algebra is always an ideal. A less trivial ideal is 
the derived Lie algebra L^-*^) of L consisting of all elements that can be written as a finite 
sum of elements of L n L. If / C L is an ideal in L one may form the quotient Lie algebra 
L/J, whose elements are the equivalence classes [/] of all £ L such that f — g E I, with 
addition, scalar multiplication, and Lie product given by 

[f] + [9]---[f + 9], if] ^[g]--- if ^9]- 

It is well-known that the vector space operations are well-defined. The Lie product is well- 
defined since /' G [/] implies /' — / G /, hence {f — f) ^ g E I and [/'] ~i [g\ — [f g\ — 

If L and L' are Lie algebras, their direct sum L ® L' is the direct sum of the vector spaces 
equipped with the Lie product defined by 

{x + x') ~\ {y + y') — X ~\ y + x' ~\ y' 

for all 7/ G L and all a;', y' G L'. It is easily verified that the axioms are satisfied. 



8.2 Lie algebras from derivations 

Equation (8.2), 

J^{9^h)^{j^g)^h-rg^{j^K). 
resembles the product rule for (partial) differentiation; 

^ ( u\ dg dh 
9^(9h)^g^h + g-. 

To make the similarity more apparent we introduce for every element / G L a linear operator 
ad/ : L — > L, the derivative in direction /, given by 

ad/S- := g. 

The notation refiects the fact that the operator ad : L — > L defined by 

ad/ := ad/ 

is the adjoint representation of L; see Sections 10.3 and 10.6. 

Note that an element / G L is in the center Z{L) = Cl(IL) of L if and only if the linear 
operator ad/ is zero. 
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8.2.1 Example. For / in the Lie algebra C°°(R x R) constructed in Theorem 8.1.3, we 
have 

ad/c/ ^ f ^ 9 ^ fp9g - fg9p = (^/p^ - fg-^ 9 ■ (8-6) 

The vector field Xf on R x R defined by the coefficients of ad/ is called the Hamiltonian 
vector field defined by /; cf. Chapter 12. In particular, the Hamiltonian derivative 
operators with respect to p and q take the explicit form 

_ d _ d 

and we have 

ad/ = fpXp + fqXq. 



With the convention that operators bind stronger than the Lie product, the Jacobi identity 
can be written in the form 

Sidf{g ~i h) = adfQ ~i h + g ~i adfh . 

The Jacobi identity is thus equivalent to saying that the operator ad/ defines for every / a 
derivation of the Lie algebra. 

8.2.2 Definition. 

(i) A derivation of a vector space A with a bilinear product o is a hnear map 5 : A ^ A 
satisfying 

Sif o g) = Sf o g + f o 6g , 

for all f,g E A. We denote by Der A the set of all derivations of A. (In the cases of interest, 
A is an associative algebra with the associative product as o, or a Lie algebra with the Lie 
product as circ. 

(ii) If E is an associative algebra E, a (left) E-module is an additive abclian group V 
together with a multiplication mapping which assigns to / G E and x E Y a. product 
fx eY such that 

f{x + y) = fx + fy, {f + g)x ^ fx + gx, f{gx) = {fg)x 

for all /, 51 e E and all x,y eY. 

8.2.3 Proposition. The commutator of two derivations is a derivation. In particular, 
Der E is a Lie subalgebra of Lin E with Lie product 

S^S' -.^ [S,d']. 

Moreover, if 6 E Der E and / G E then the product fS defined by 

{fS)g f{Sg) 
is a derivation, and turns Der E into an K-module. 
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Proof. Since Der E is a linear vector space, and since the antisymmetry property and the 
Jacobi identity are aheady satisfied in Lin E, we only need to check that the Lie product of 
two derivations is again a derivation. We have: 

(S^mg) = i55')ifg)-i5'5)ifg) 

= S{(5'f)g + f{5'g)) - 5'{{5f)g + f(5g)) 

= {SS'f)g + fSS'g - {S'Sf)g - f55'g 

= {S^S'f)g + f{S^S'g) 

This proves the first part. The second part is straightforward. □ 



8.2.4 Proposition. The centralizer of a subset S of Lin E, defined as 
C{S) := {5 e Der E I ^ n 5 = for all A e S} , 

is a Lie subalgebra of Der E. 

Proof. As before, we only need to prove that the Lie product closes within C{S). If 
6, 5' e C{S), the Jacobi identity in the form (8.2) implies 

An {5 n 5') = {An 5)-^ 6' + 6n{AnS') = Q. 

□ 



8.3 Linear groups and their Lie algebras 

In quantum mechanics, linear operators play a central role; they appear in two essentially 
different ways: Operators describing time evolution and canonical transformations are linear 
operators C/ on a Hilbert space, that are uniteiry in the sense that U*U — UU* — 1, and 
hence bounded^. The unitary operators form a group, which in many cases of interest is a 
so-called Lie group. 

On the other hand, many important quantities in quantum mechanics are described in 
terms of unbounded linear operators that are defined not on the whole Hilbert space but 
only on a dense subspace. Usually, the linear operators of interest have a common dense 
domain EI on which they are defined and which they map into itself. EI inherits from the 
Hilbert space the Hermitian inner product, hence is a complex Euclidean space, and 
the Hilbert space can be reconstructed from EI as the completion H of H by equivalence 
classes of Cauchy sequences, in the way familiar from the construction of the real numbers 
from rationals. We therefore consider the algebra Lin EI of continuous linear operators on 
a Euclidean space EI, with composition as associative product. 

^The bounded operators on a Hilbert space a so-called C*-algebra; see for example RiCKART [207], 
Baggett [14], or Werner [257]. But we do not use this fact. 
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In this section, we define the basic concepts relevant for a study of groups and Lie algebras 
inside algebras of operators. Since for these concepts neither the operator structure nor the 
coefficient field matters in most cases - as long as the characteristic is not two -, we provide 
a slightly more general framework. In the next section, we apply the general framework to 
the algebra C"^" = LinC" of complex n x n-matrices, considered in the standard way as 
linear operators on the space of column vectors with n complex entries. Many of the 
Lie groups and Lie algebras arising in the applications are naturally definied as subgroups 
or subspaces of this algebra. 

An (associative) algebra over a field K is a vector space E over K with a bilinear, 
associative multiplication. For example, every *-algebra is an associative algebra over C. 
As traditional, the product of an associative algebra (and in particular that of LinH and 
K"^") is written by juxtaposition. An associative algebra E is called commutative if 
fg = gf for all /, (7 G E, and noncommutative otherwise. In many cases we assume that 
such an algebra has a unit element 1 with respect to multiplication; after the identification 
of the multiples of 1 with the elements of K, this is equivalent to assuming that K C E. 
If E and E' are associative algebras over K with 1, then a K-hnear map : E — > E' is an 
algebra homomorphism if 4>{fg) — 4>{f)<i){g) and 0(1) = 1. Often we omit the reference to 
the ground field K and assume a ground field has been chosen. 

We now show that every associative algebra has many Lie products, and thus can be made 
in many ways into a Lie algebra. For commutative Lie algebras, the construction is a bit 
cumbersome and only leads to abelian Lie algebras. 

8.3.1 Theorem. Let E be an associative algebra. Then, for every J e E, the binary- 
operation ~i J defined on E by 

f^j9 — fJg-gJf 

is a Lie product. In particular (J — 1), the binary operation ~i defined on E by 

f^9-= [f,9] 

where 

[f,9] ■= f9-9f 
denotes the commutator of / and g, is a Lie product. 

Proof. We compute the associator (8.4) for the bilinear operation f o g '■= fJg'- 

[f, 9, h]^{fo g)Jh - fJ{g o h) = fJgJh - fJgJh = , 
by associativity. Hence the associator of o satisfies (8.5). Hence ~ij is a Lie product. □ 

Note that J f ~^ Jg = J{f ^jg). Hence the corresponding Lie algebras are isomorphic when 
J is invertible. 

If E and E' are two associative algebras with unity, we may turn them into Lie algebras 
by putting f ^ g = [f,g] in both E and E'. We denote by L and L' the Lie algebra 
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associated to E and E', respectively. If (j) is an algebra liomomorphism from E to E' 
then induces a Lie algebra liomomorphism between the Lie algebras L and L'. Indeed 

0(/ ^ g) = <P{f9 - gf) = mm - mm = ^ m- 

Theorem (8.3.1) applies in particular to E = K"^". The Lie algebra K"^" with Lie product 
f ^ 9 [/ifi'] is called the general linear algebra 5f/(n, K) over K. If K = C, we simply 
write gl{n) = gl{n,C); similar abbreviations apply without notice for the names of other 
Lie algebras introduced later. 

8.3.2 Definition. 

(i) A Hausdorff *-algebra is a *-algebra E with a Hausdorff topology in which addition, 
multiplication, and conjugation are continuous. An element / e E is called complete if 
the initial-value problem 

p{t)^fU{t), C/(0) = 1 (8.7) 

has a unique solution U : M — > E. Then the mapping U is called a one-parameter group 
with infinitesimal generator /, and we write e*-^ := U{t); this notation is unambiguous 
since it is easily checked that e*^*-^^ = e*^**^-'' for s, i e R. An element / e E is called self- 
adjoint if /* = / and the product if with the imaginary unit is complete. Wc call an 
element G E exponential if it is of the form g = for some complete / G E. We call a 
Hausdorff *-algebra E an exponential algebra if the set of exponential elements in E is 
a neighborhood of 1. 

(ii) A linear group is a set G of invertible elements of some associative algebra E such 
that 1 G G and 

g,g'eG g-\gg'eG. 

If E is given with a topology in which its operations are continuous, we consider G as a 
topological group with the topology induced by caUing a subset of G open or closed if it is 
the intersection of an open or closed set of E with G. 

(iii) A lineeir Lie group is a closed subgroup of the group E^ of all invertible elements 
of an exponential algebra E. A Lie group is a group G with a Hausdorff topology that 
is isomorphic to some linear Lie group G, i.e., for which there is an invertible mapping 
(f) : G ^ G such that 4> and are continuous and 0(1) — 1, (p{gg') — <P{g)(p{g') for all 
^,^'gG. 



For all exponential algebras E, the group E^ is a linear Lie group. Note that the law 
e^e^ — e^^^ holds if / and /' commute but not in general. In particular, 

g/g-/ = e° = 1. 



If E is a Banach algebra, i.e., if the topology of E is induced by a norm || ■ || satisfying 
\\fg\\ < 11/11 Ibll, it is not very difficult to show that every / G E is complete, and we have 
for all / G E the absolute convergent series expansion 



k=0 
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and that / = log g, where 



oo 



(1-5)' 



log^ — -XI 



k 



for ||1 - gr|| < 1, 



k=l 



provides an / such that g — e^. Therefore every Banach algebra is exponential. Note 
that the exponential mapping, which maps a matrix / e E to e-^ e E^, is usually not 
surjective. 

The above applies to the case E = C"^" with the maximum norm, which is a Banach 
algebra, which covers all finite-dimensional Lie groups. In infinite dimensions, however, 
many interesting linear Lie groups are not definable over Banach algebras. 



8.4 Classical Lie groups and their Lie algebras 

This section is not yet in a good form. 

A matrix group is a hnear group in an algebra K"^"^. In this section, we define the most 
important matrix groups and the corresponding Lie algebras. Although we get Lie algebras 
no matter which field is involved, we get Lie groups in the sense defined above only when 
K is the field of real numbers of the field of complex numbers^. 

8.4.1 Example. The group GL{n,K) of all invertible n x n-matrices over K = R or 

IK = C is a linear group, The subgroup of GL{n,'K) consisting of the matrices with unit 
determinant is denoted by SL{n,K^). In other words, SL{n,lK.) is the kernel of the map 
det : GL[n,K.) — >■ K*, where K* is the group of invertible elements in K. The Lie algebra 
of SL{n, K) is denoted by sl{n, K) and consists of the traceless n x n matrices with entries 
in K. By Theorem 8.3.1, the algebra of n x n-matrices with entries in K is a Lie algebra 
the commutator as Lie product; this Lie algebra is denoted by gl{n,K.). The center of 
gl{n,K) is easily seen to be the 1-dimensional subalgebra spanned by the identity matrix, 
Z{gl{n,K)) =K1 =1C. 

Every subspace of a Lie algebra closed under the Lie product is again a Lie algebra. This 
simple recipe provides a large number of useful Lie algebras defined as Lie subalgebras of 
some gl{n,K.). Conversely, the (nontrivial) theorem of Ado, not proven here but see 
e.g. Jacobsen [121], states that every finite-dimensional Lie algebra is isomorphic to a Lie 
subalgebra of some gl{n, M). 

^For the real and fields, because exponentials can be defined, the groups have a natural differential 
geometric structure as differentiable manifolds; cf. Section 11.7. For general fields, there are no exponen- 
tials, and one needs to replace the differential geometric structure inherent in Lie groups by an algebraic 
geometry structure, and may then interpret general matrix groups as so-called groups of Lie type. In 
particular, for finite fields, one gets the Chevalley groups, which figure prominently in the classification 
of finite simple groups. 
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The group GL(n, K) is one of the most important finite-dimensional hnear groups and all 
finite-dimensional linear groups are isomorphic tosubgroups of GL{n,K^) for some n. If 
K = M or K = C then every closed subgroup G of GL{n,K) is a Lie group. These Lie 
groups have associated Lie algebras L — logG of infinitesimal generators. For any Lie 
subgroup G of GL{n, K) one gets the Lie algebra by looking at the vector space of those 
elements X of gl{n, K) such that e^'^ is in G for e small enough. This criterion is very useful 
since we can take e so small that we only have to look at the terms linear in e so that we 
don't have to expand the exponential series completely. If the subgroup G C GL(n, K) is 
connected and either compact or nilpotent, then the exponential map can be shown to be 
surjective, see e.g. Knapp [137]. 

The Lie algebra s/(n, K) is the Lie subalgebra of 5f/(n, K) given by the traceless matrices. 
The dimension is — 1 and we have 

The quotient is well defined and is a Lie algebra because K is the center and thus in 
particular an ideal. 

If L is a Lie algebra over M then by taking the tensor product with C and extending the 
Lie bracket in a C-linear way, one obtains the complexiflcation of L, denoted L*^. The 
proces of complexification is also called extension of scalars. In particular, if we write 
L*^ = C ®]R L then in L"^ the Lie bracket is given by {a ® x) ^ {fi ® y) = aP ® (x ~> y). 
The reverse proces is called realization or restriction of scalars; we clarify the proces 
of restriction of scalars by an example. 

8.4.2 Example. Consider L = sl{2, C). We wish to calculate sl{2, C)*. A basis of sl{2, C) 
is given by the elements 




This basis is as well a basis for s/(2,M); therefore we see sl{2,C)* ^ s/(2,M) ©r is/(2,M) 
as real vector spaces. The Lie product oi f + ig and /' -|- ig' for /, /' e sZ(2,R) and 
ig, ig' e i sl{2, R) is given by 

The reader who has already some experience with Lie algebras is encouraged to verify the 
isomorphism s/(2, C)'* = so(3, 1). 

8.4.3 Example. Suppose we have a symmetric bilinear form B on K". The Lie algebra 
so{n, B-jK) is the subspace of all / e sl{n, K) satisfying 

B(fv,w)^-B{vjw). (8.8) 

We leave it to the reader to show that if / and g satisfy (8.8), then so does fg — gf; thus 
we have indeed a Lie algebra. In the special case where B{v,w) = v'^w, the Lie algebra 
so{n, B] K) is called the complex orthogonal Lie algebra so{n, K). In matrix language. 
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so{n, K) is the Lie algebra of antisymmetric matrices with entries in K and has dimension 
l)/2. 

An orthogonal matrix is a matrix Q satisfying 

Q^Q = 1. (8.9) 

The orthogonal n x n-matrices with coefficients in a field K form a subgroup of the group 
GL(n, K), the orthogonal group 0{n,K). Since (8.9) implies that (detQ)^ = 1, orthog- 
onal matrices have determinant ±1. The orthogonal matrices of determinant one form a 
subgroup of 0(n, K), the special orthogonal group SO{n,K^). The corresponding Lie 
algebra is so(n, K) = logO(n, K) = logSO{n,K), the Lie algebra of antisymmetric n x n- 
matrices. In particular, the group SO (3) = 50(3, M) consists of the rotations in 3-space 
and was discussed in some detail in Section 1.6. 

For a nondegenerate B (i.e., one where B{v,w) — for all v implies w = 0) and K = C 
(or any algebraically closed field), we can always choose a basis in which the bilinear form 
is represented as the identity matrix. Therefore all so(n, 5;IK) with nondegenerate B are 
isomorphic to so{n, K). 

Over IK = M, symmetric bilinear forms are classified by their signature, i.e., the triple 
{p, q, r) consisting of the number p of positive, q of negative, and r of zero eigenvalues of 
the symmetric matrix A representing the bilinear form B; B{v,w) = v^Aw. The form 
B is nondegenerate if and only if r = 0. Bilinear forms with the same signature lead to 
isomorphic Lie algebras. In particular, so{p, q) denotes a Lie algebra so{p + g, B, M) where 
5 is a nondegenerate symmetric bilinear form B on of signature (p, g, 0). The basis can 
always be chosen such that the representing matrix A is 




where Ip and Iq are the p x p and qx q identity matrix, respectively. In this basis, the Lie 
algebra so(p, q) is the subalgebra of gl{n, R) consisting of elements / satisfying 

Note that if / e so{p, q) then 

= tr {{flp,q + IpM,q) = 2 ir{fll^) = 2 tr(/) 
and hence so{p,q) C s/(n, M). 

8.4.4 Example. Let F be a vector space over a field K. Suppose V is equipped with a 

symmetric or antisymmetric nondegenerate bilinear form B. There is a symmetry group 
associated to the bilinear form consisting of the linear transformations Q : V ^ V such 
that 

B{Qv,Qw) = B{v,w) 
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for all V, w in V. If B is symmetric one calls the group of these linear transformations 
an orthogonal group and denotes it by 0{B,'K). The associated Lie algebra is o{B,'K). 
Indeed, e^^ transforms x, y into 

= B{{l + tf)x,{l + tf)y) + 0{t') 
= B{x,y)+tB{fx,fy) + Oit''). 

8.4.5 Example. When K = M, one has for symmetric bilinear forms another subdivision, 
since B can have a definite signature (p, q) where p + q is the dimension of V. If B is of 
signature (p, q) , this means that there exists a basis of V in which B can be represented as 

B(v,w) = v^Aw, where A = diag(— 1, . . . , ~1, 1, ■ -j ■ 

p times q times 

The group of all linear transformations that leaves B invariant is denoted by 0{p,q). The 
subgroup of 0{p,q) of transformations with determinant one is the so-called special or- 
thogonal group and is denoted by SO{p,q). The associated real Lie algebra is denoted 
so{p, q) and its elements are linear transformations A : V ^ V such that for allv,w e V we 
have B{Av, w) + B{v, Aw) — 0. The Lie product is given by the commutator of matrices. 

The group of all translations in V generates together with SO{p, q) the group of inhomo- 
geneous special orthogonal transformations, which is denoted ISO{p,q). One can 
obtain ISO{p, q) from SO(p, q+1) by performing a contraction; that is, by rescaling some 
generators with some parameter e and then choosing a singular limit e — or e — oo. The 
group ISO{p, q) can also be seen as the group of (p + g + 1) x {p + q + l)-matrices of the 
form 

with Q e SO{p,q) , beV. 

The Lie algebra of ISO{p,q) is denoted iso{p,q) and can be described as the Lie algebra 
of ip + q + 1) X {p + q + 1) -matrices of the form 

with A e so{p, q) , b E V . 

Again, the Lie product in iso{p, q) is the commutator of matrices. 




We define the symplectic Lie algebra sp{2n, K) as the Lie subalgebra of gl{2n, K) given 
by the elements / satisfying 

fj+Jf^Q, (8.10) 
where J is the 2n x 2n-matrix given by 




We leave it to the reader to verify that if / and g satisfy (8.10), then so does fg — gf. 
Another useful exercise is to prove s/(2,K) = sp(2,K). (Caution: The reader is warned 
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that in the hterature there are different notational conventions concerning the symplectic 
Lie algebras. For example, some people write sp{n, K) for what we and many others call 
sp{2n,K).) 

If B is antisymmetric in the example 8.4.4, the group is called a symplectic group and 
one writes Sp{B, K). The associated Lie algebras is sp{B, K). If is of finite dimension m 
one writes Sp{B,K) = Sp{m,K). Note that m is necessarily even. 

Other real Lie algebras that play a major role in many areas of physics are the unitciry Lie 
algebras and the special unitary Lie algebras - called so because they are the generating 
algebras of the groups of (special) unitary matrices, a term that will be explained in Section 
11.7. The unitary Lie algebra u{n) consists of all antihermitian complex n x n matrices. 
The special unitary Lie algebra is defined as the antihermitian n x n complex traceless 
matrices and is denoted su{n). It is clear that su{n) C u{n). It might seem weird to call 
a Lie algebra real if it consists of complex-valued matrices. However, as a vector space 
the antihermitian complex n x n matrices form a real vector space. If / is a antihermitian 
matrix, then if is Hermitian. The dimension (as a real vector space) of su{n) is — 1, and 
the dimension of u{ri) is n^. It is a good exercise to check that so(3) = su{2) since these 
two Lie algebras will return very often. A hint: so(3) consists of anti-symmetric real 3x3 
matrices, so there are only three. Choosing an obvious basis for both su{2) and so(3) will 
do the job. 

8.4.6 Example. A complex matrix U is unitciry if it satisfies 

UU^ = 1, 

where (f/^)ij = Uji. Since the inverse of a matrix is unique, it follows that also U'^U = 1. 
By splitting all the matrix entries into a real and imaginary part Uij = Aij + iBij wc sec 
that the set of n x n unitary matrices makes up a submanifold of M^" of dimension n'^. 
The linear group of unitary nx n matrices is denoted U {n) . 

^ k\ 

k=0 

Then multiply A with a parameter t, take i — > and keep only the linear terms: U — 
l + tA + 0{t^). Then since U has to be unitary, we obtain 

l = {l + tA + 0{e)){l + tA + 0{t)y = l + t{A + A^) -f- 0{tf , 

implying that A has to be antihermitian. Thus the Lie algebra of infinitesimal generators 
of U (n) is u{n). 

The subgroup of U{n) of all elements with determinant 1 is denoted by SU{n) and is called 
the special uniteiry group. The dimension of SU{n) is — 1. For the determinant we 
get 

detU = l+trtA + 0{tf , 

and thus the trace of infiniteesimal generators of SU{n) has to vanish, and we see that 
the corresponding Lie algebra is su{n). Note that the Lie algebra u{n) contains all real 
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multiples of ?M, which commutes with all other elements. Hence u{n) has a center, whereas 
su{n) does not. 

In the case n = 2 it is a nice exercise to show that each special unitary matrix U can be 
written as 

, x,yeC, \x\' + \y\'^l. 

Writing x = a + ib and y = c + id for a, b, c, d E M. we see that + 6^ + + = 1. This 
implies that there is a one-to-one correspondence between SU{2) and the set of points on 
the unit sphere in R^. Thus SU{2) is as a manifold homeomorphic to S^. In particular 
SU{2) is compact. Hence every element U of SU{2) can be written as the exponent of a 
matrix A. 

Physicists prefer to work with Lie algebras defined by Hermitian matrices, corresponding 

to Lie *-algebras. In the applications, distinguished real generators typically represent 
important real-valued observables. Therefore they tend to replace the matrix A by iA for a 
Hermitian matrix A. This is one of the reasons why the structure constants for real algebras 
appear in the physics literature with an i, as alluded at the end of Section 8.2. 



8.5 Heisenberg algebras and Heisenberg groups 

A Heisenberg algebra is a Lie algebra L with a 1-dimensional center and a distinguished 
Lie central element 1 called one or identity, such that every / n is a multiple of 1 for 
all f,g E h. There is an embedding of K into L given hj a ^ al which can be used to 
identify the multiples of 1 with the multipliers from the field, so that K = Z(L) C L. 

When we divide out the center of a Heisenberg algebra we obtain an abelian Lie algebra. 
More generally, let L be any Lie algebra and let L' be another Lie algebra with a subalgebra 
Z contained in the center of L'. If L'/Z is isomorphic to L, one calls L' a central extension 
of L.3 

Corresponding to any Heisenberg algebra there is an alternating bilinear form a; : LxL ^ K 
given by 

f^9 = ^if,9) ■ 

^In more abstract terms, central extensions are conveniently described by short exact sequences. Let Ai 
be a set of Lie algebras and suppose that there are maps di : Ai Ai+i; 

. . . . > Ai ^ Ai+i ^ . (8.11) 

Wc call the sequence exact if Kc;r di = Im fi,;_i for all i where there are di-i and di. As an exercise, the 
reader is invited to verify the following assertion: The sequence O^A^B^Ois exact if and only if 
A = B and the isomorphism is the map from AtoB. A short exact sequence is a sequence of maps of the 
form 

A^ B ^0. 

A central extension of L is then a Lie algebra L' such that there is an exact sequence ^ Z ^ h' ^ 
L — > with Z abelian. 
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Conversely, given such a form on an arbitrary vector space V not containing 1, this formula 
turns L := K © V into a Heisenberg algebra. If cu is nondegenerate on V it defines a 
symplectic form on V. 

The Heisenberg algebra h{n) is the special case where K = C, V = C^", and u is nondegen- 
erate. Thus h{n) is a central extension of the abclian Lie algebra C^" and has dimension 
2n + 1. We can find a basis of V consisting of vectors pk and for 1 < A;, / < n such 
that u{pk,pi) = u{qk,qi) = for all k,l and u{pk,qi) = Ski] that is, cu is then the stan- 
za — Icrl \ 

dard symplectic form on K^" represented by the matrix I . Thus Heisenberg 



^0 

algebras encode symplectic vector spaces in a Lie algebra setting. Everything done here 
extends with appropriate definitions to general symplectic manifolds, and, indeed, much of 
classical mechanics can be phrased in terms of symplectic geometry, the geometry of such 
manifolds - we refer the reader to the exposition by Arnold [12] on classical mechanics 
and symplectic geometry. 

8.5.1 Example. Let us write t(n, K) for the Lie subalgebra of gl{n, K) consisting of upper- 
triangular matrices and n{n,K) as the Lie subalgebra of gl{n,K) consisting of strictly 
upper-triangular matrices, which have zeros on the diagonal. 

The Lie algebra t(3, K) of strictly upper triangular 3 x 3-matrices is a Heisenberg algebra 
with 

/O 1 
1=0 
\0 

since 

a' y\ /O ay-7a'\ 
/3' = = aV - -fa'. 

/ \0 / 

The Lie algebra t(3,C) is called the Heisenberg algebra; thus if one talks about "the" 
(rather than "a") Heisenberg algebra, this Lie algebra is meant and is denoted h{l). Intro- 
ducing names for the special matrices 

/O 1 0\ 

p:= , q:-- 
\0 0/ 

we find that p, q and 1 form a basis of t(3, C), and we can express the Lie product in the 
more compact form 

{ap + pq + j)^ {a'p + (5'q + 7') = a(5' - (5a'. (8.12) 

Defining 

{ap + /3q + 7)* •.= ap + Pq + ^ 

turns the Heisenberg algebra into a Lie *-algebra in which Rp and q are Hermitian. Note 
that here * is not the conjugate transposition of matrices! 
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(8.12) implies that p and q satisfy the so-called canonical commutation relations 

p~iq=l, p~ip = gng = 0. (8.13) 

Since / "i 1 = when 1 is Lie central, (8.13) completely specifies the Lie product. The 
canonical commutation relations are frequently found in textbooks on quantum mechanics, 
but we see that they just characterize the Heisenberg algebra. 

The notation q and p is chosen to remind of position of momentum. Indeed, the canonical 
commutation relations arise naturally in classical mechanics. In the Lie algebra C°°(M x M) 
constructed in Theorem 8.1.3, we consider the set of ciffine functions, that is, those that 
are of the form f{p, q) — afp + (5fq + 7/, with a/, 7/ e C. In particular, the constant 
functions are included with = = 0, and we identify them with the constants 7/ e C. 
Given another affine function gf(p, q) = agp + j3gq + 7g, we find 

/ n ^ = af(3g - PfUg e C . 

Since f ~\ g is just a complex number times the function that is 1 everywhere, it is a central 
element, that is, it Lie commutes with all other algebra elements. Thus the affine functions 
form a Heisenberg subalgebra of C°°(R x R), and p and q satisfy the canonical commutation 
relations. 



Suppose that a commutative Poisson algebra E contains two elements p and q satisfying the 
canonical commutation relations (8.13). Then E contains a copy of the Heisenberg algebra. 
The algebra of polynomials in p and q is then a Poisson subalgebra of E in which (8.6) 
is vahd. This follows from Proposition 9.1.5. Thus the canonical commutation relations 
capture the essence of the commutative Poisson algebra C°°(R x R). But getting the bigger 
algebra requires taking limits which need not exist in E, since with polynomials alone, one 
does not get all functions. 

8.5.2 Example. An upper triangular n x n-matrix is called unit upper triangular if 

its elements on the diagonal are 1, and strictly upper triangular if its elements on the 
diagonal are zero. It is straightforward to check that the unit upper triangular n x n- 
matrices form a subgroup T(n, K) of the group GL{n, K), and the strictly upper triangular 
n X n-matrices form a Lie subalgebra of gl{n,K), which we denote by t{n,K). We have 
t{n, K) — logT(n, K). In the following we shall look more closely at the case n — 3 which 
is especially important. 



The Heisenberg group is the group 



T(3,C) = { 





a c\ 





1 b 





ij 


3x3. 


its CO 



a,b,c & c| 



(8.14) 



of unit upper triangular matrices in C 
berg algebra t(3,C). Since the Heisenberg group is defined in terms of matrices, it comes 
immediately with a representation, the defining representation. Note that the defining 
representation is not unitary, [att group representations not yet defined.] 
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The relation between the Heisenberg algebra and the Heisenberg group is particularly simple 
since the exponential map has a simple form. Indeed, if A e c^xn ^j^g^ 



k=0 



where A° = 1 is the identity matrix and the series (8.15) is absolutely convergent. A 
note on the infinite-dimensional case: For linear operators A on a Hilbert space H, the 
series converges absolutely only when A is bounded (and hence everywhere defined); for 
unbounded but self-adjoint A (which are only densely defined), convergence holds in a 
weaker sense giving 

^^^ = E|!^ (8.16) 

k=0 

for a dense set of vectors -0 e EI that are analytic for A. 

If A e t(3, C), then a direct calculation shows that A'^ is of the form 




for some c G C. Hence A^ = and the exponential of A is simply given by = 1-1-^4-1- ^A"^. 
Thus if A is given by 





the exponential is given by 



The map A — > e"^ is clearly bijective. The inverse map is given by the logarithm, which is 
for matrices defined by 

\og{l + X) = J2^-^X\ (8.17) 

k=l 

SO that for the Heisenberg group G we have 

log(X) = {X -I)- i(X - 1)2 = -2 + 2X- \X'^ . 

We are thus in the situation that both T(3, C) = expt(3,C) and t(3, C) = logT(3,C). 
This is not special to the Heisenberg group, neither docs it hold in general. But there is a 
class of groups for which this holds. For example, the exponential map is surjective for all 
connected Lie groups that are compact or nilpotent (see below), see, e.g., Helgason [111] 
or Knapp [137]. The Heisenberg group is a noncompact but nilpotent Lie group. 



Let us shortly repeat what it means when a group is nilpotent. Given any group G, we 
can form the commutator subgroup G^^\ which is generated by all elements of the form 
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aba ^ for all a, b E G. We can also consider the commutator subgroup of G^^^ and denote 
it by G^'^\ Repeating this procedure we get a sequence of groups 

G D D G'(^) D ... 

A group is nilpotent if the procedure ends in a finite number of steps with the trivial group 
Q{n) ^ I jg gg^gy ggg ^-^^^ ^jjg Heisenberg group is two-step nilpotent since (7(2) = 1. 

Since the exponential map is bijective for the Heisenberg group, there exists a binary 
operation © on t(3, C), where A® B is the element with 

e^e^ = e^®^. (8.18) 

It is not difficult to give an exphcit formula for A ® S. Since A and B are strictly upper 
triangular, we have A'^B'^ = for p + g > 3. We thus have 

e^e^ = (l + A+ + B + ^B^) = 1 + A + B + ^{A^ + B^ + 2AB) . 

Applying (8.17) we find 

AeB = \og{l + A + B + ^(A'^ + B'^ + 2AB)) 
= A + B + ^(AB-BA), 

hence 

A®B^A + B + ^AnB. (8.19) 

Thus we get from (8.18) the formula e^e^ = ^^+^+2^ ^ . Since A~^ B is central, it behaves 
just like a complex number, and we find the Weyl relations 

^A+B ^^-\A.B^A^B _ (g_20) 

In fact this result is also a direct consequence of the famous (but much less elementary) 
Bciker— Campbell— Hausdorff (BCH) formula that gives for general matrix Lie groups 
a series expansion oi A® B when A and B are not too large. Even more generally, the 
Baker-Campbcll-Hausdorff formula applies to abstract finite-dimensional Lie groups'^ that 
are not necessarily matrix groups and says that for two fixed Lie algebra elements A and 
B and for small enough real numbers s and t there is a function C from R x R to the Lie 
algebra such that we have 

gSAgtS ^ ^C{s,t) _ 

The function P(s, t) is given by a (for small s, t absolutely convergent) infinite power series, 
the first terms of which are given by 

P(s, t) = sA + tB + —A^ B + — {sA^ {A^ B) - tB -^ {A^ B)) + . . . . 

In fact, this series expansion may be derived from a closed form integral expression. 

The Baker-Campbell-Hausdorff formula is of great importance in both pure and applied 
mathematics. It gives (where it applies; in particular in finite dimensions) the relation of a 



^In infinite dimensions, additional assumptions are needed for the BCH-formula to hold. 
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Lie group with the associated Lie algebra. It for example says that the product of and 
for some A and B in the Lie algebra is again an element of the form e*^ with C in the Lie 
algebra. Hence the exponents of the Lie algebra generate a subgroup of the corresponding 
Lie group. 

For infinite-dimensional Lie algebras and groups, one has to use a refined argument cen- 
tering around the Hille-Yosida theorem. Let U (t) denote a one-parameter group of linear 
operators on a Hilbert space H such that t ^ U{t) is strongly continuous, which means 
that t ^ U{t)(p is continuous for all (p e M. Then we can differentiate U{t) to obtain the 
strong limit 

t^o t 

The object A is called infinitesimal generator of the one-parameter group U{t). It turns 
out that A is a closed hnear operator that is defined on a dense subspace in H. The Hille- 
Yosida theorem gives a necessary and sufficient condition for a closed linear operator A to 
be the infinitesimal generator of some strongly continuous one-parameter semigroup 

U{t) = e'^ , 

since in general one might not get a group. The Hille-Yosida theorem is very useful for 
analyzing the solvability of linear differential equations 

examples of which are the Schrodinger equation or the heat equation. If the conditions of 
the Hille-Yosida theorem hold for A, the solution to this initial value problem takes the 
form 

For the (hyperbolic, conservative) Schrodinger equation, A — —^H with a self-adjoint 
Hamiltonian H, the solution exists for all t, and the U (t) form a one-parameter group. For 
the (parabolic, dissipative) heat equation, A = kA is a positive multiple of the Laplacian 
A = dl + dy + d^, the solution exists only for t > 0, and we only get a semigroup. 



8.6 Lie *-algebras 

Many Lie algebras of interest in physics have an additional structure: an adjoint mapping 
compatible with the Lie product. 

8.6.1 Definition. A Lie *-algebra is a Lie algebra L over C with a distinguished element 
1 7^ called one and a mapping * that assigns to every / e L an adjoint /* G L such that 

{f+gr = r+9\ {f^9r = r^9% 
r = f, (A/)* = Ar, r = i 

for all /, e L and A e C. We identify the multiples of 1 with the corresponding complex 
numbers. 
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The reason why we include the 1 into the definition of a Lie *-algebra is that many physically 
relevant Lie algebras are equipped with a distinguished central element^. But the presence 
of 1 is not a restriction, since one can always adjoin a central element 1 to a Lie algebra L' 
without nonzero central element and form the direct sum L = L' © K. 

An important Lie *-algebra for nonrelativistic quantum mechanics is the algebra E = Lin EI 
of linear operators of a Euclidean space EI (usually a dense subspace of a Hilbert space EI). 
The relevant Lie product is defined by Theorem 8.3.1 with the choice 

^ r = e LinH, 
n n 

where le is the identity operator on EI, and the conjugate of / G E is given by the adjoint 
of /, defined as the linear mapping /* satisfying (l)*ftp = {f(f))*ip fir all ^M. Dropping 
the index J in the Lie product, we get the quantum Lie product 

f^9 = l{f9-9f) = l[f,9] (8.21) 

of f,g G LinH, already familiar from (1.3). Note that the axioms require the purely 
imaginary factor in this formula, whereas the value of Planck's constant h is arbitrary from 
a purely mathematical point of view. In quantum field theory, a different choice of J is 
sometimes more appropriate. 

For any Lie *-algebra, the set 

ReL := {/ e L I /* = /} 

is a Lie algebra over M. When describing symmetries, physicists often work with Lie alge- 
bras over the reals; the present Lie *-algebras are then the complexifications of these real 
algebras, with a central element 1 adjoined if necessary. 

The complexification of a real Lie algebra L is the Lie *-algebra CL defined as follows. 
In case that a complex scalar multiplication is already defined on L, one first replaces L by 
an isomorphic Lie algebra in which i/ ^ L if / e L is nonzero. Then one defines 

CL = L ® ih, 

extending scalar multiplication in a natural way to the complex field. That is, any element 
/ e CL is of the form 

f-fi + if2 

with /i, /2 G L, and one defines 

a{Pf):^{aP)f, af ^ (5g -.^ {a(5)f ^ g 

^Many such Lie algebras are realized most naturally as central extensions of semisimple Lie algebras, 
corresponding to projective representations of semisimple Lie algebras. By including the 1 automatically 
we work directly in the central extension, and avoid the cohomological technicalities associated with the 
formal discussion of central extensions and projective representations. 
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for all f,g eL, and a, (3 E C. Conjugation is defined as 

(/i+^/2)*:=/i-^/2 for/i,/2eL; 

The axioms for a Lie *-algebra are easily established if 1 G L. Note that the real dimension 
of L equals the complex dimension of CL. It is easy to check that 

ReCL ^ L. 

Conversely, for a Lie *-algebra L, 

CReL^L. 

If a complex Lie algebra L' is isomorphic to CL as a Lie algebra, one says that L is a real 
form of the complex Lie algebra L'. 

We leave it as an exercise to verify Csu{n) = sl{n, C) and Cso(p, q) = so{p + q,C). In 
general, a complex Lie algebra has more than one real form as we can see since for p ^ q,n—q 
the Lie algebras so(p, n — p) and so{q, n — q) are not isomorphic. 

An involutive Lie algebra (Neeb [180]) is a Lie algebra L with Lie product [•, •] and with 
an involutive, antilinear anti- automorphism cr, i.e., a mapping cr : L — > L satisfying 

o-(tt/) = ci af\ a{f n g) =ag^af 

for a E C, f, g E h. Associated to an involutive Lie algebra L is the real form Lk = 
|x e L I (7X = — a;|. Our definition of a Lie *-algebra is closely related and obtained as 

follows, after adjoining to L a central element 1 if necessary. We define L as the vector 
space L equipped with the Lie product "i defined by x ~\ y = j-[x,y]. Then the mapping 
X H- > ihx, with h a positive real constant (in physical applications Planck's constant) is an 
isomorphism of Lie algebras. The map a induces the conjugation x* — —ax and ReL = L^. 

8.6.2 Remarks. 

(i) The nomenclature of Lie *-algebras is a bit tricky. If L is a Lie *-algebra, we therefore 
denote it (usually) with the name of the real Lie algebra ReL. To avoid confusion, it is 
important to keep track of whether we are discussing real Lie algebras, complex Lie algebras 
or Lie *-algebras. 

(ii) In the physics literature, one often sees the defining relations (8.1) for real Lie algebras 
written in terms of complex structure constants, 

= ^ iCjkl^l ■ 
I 

where i = and the cjki are real. That is, the Lie product takes values outside of the 
real Lie algebra! What is done by the physicists is that - as in the above definition of a Lie 
*- algebra from an involutive Lie algebra - they multiply all elements in the Lie algebra by i. 
The reasons for making this seemingly difficult construction mainly has historical reasons. 
One is that in some real algebras the elements are antihermitian matrices. By multiplying 
with i one obtains Hermitian matrices and operators in quantum mechanics are represented 
as Hermitian operators. 
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The converse process of complexification is realization. Given a complex Lie algebra L, 
one regards it as a real Lie algebra L"^ by restricting scalar multiplication to real factors. 
Since / and the imaginary scalar multiple if are linearly independent over M, the real 
dimension of is twice the complex dimension of L. In the finite-dimensional 
convenient way to obtain the realization is as follows: Choose a basis ti, . . . ,tn of L and 
then form the elements sj = itj for all j. All real linear combinations of sj and tj make up 
L"^. Given two elements /, g in L''^ one calculates their Lie product as if they were elements 
of L; the result can be written as 

The Lie product of / and g( in L is then defined as 

See also Example 8.4.2. 
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Chapter 9 



Mechanics in Poisson algebras 



This chapter brings more physics into play by introducing Poisson algebras, i.e., associative 
algebras with a compatible Lie algebra structure. These are the algebras in which it is 
possible to define Hamiltonian mechanics. Poisson algebras abstract the algebraic features 
of both Poisson brackets and commutators, and hence serve as a unifying tool relating 
classical and quantum mechanics. In particular, we discuss classical Poisson algebras for 
oscillating and rotating systems. 



9.1 Poisson algebras 

Many algebras that we will encounter have both an associative product and a Lie product, 
which are compatible in a certain sense. Such algebras are Poisson algebras, our definition 
of which is the noncommutative version discussed, e.g., in Farkas & G. Letzter [77]. (In 
contrast, in classical mechanics on Poisson manifolds, one usually assumes Poisson algebras 
to be always commutative.) 

9.1.1 Definition. A Poisson algebra E is a Lie algebra with an associative and distribu- 
tive multiplication which associates with /, (? G E its product fg, and an identity 1 
with respect to multiplication, such that the compatibility condition 

f^(gh)^(fng)h + g(fnh) (9.1) 

holds. Equation (9.1) is also called the Leibniz identity. In expressions involving the 
associative product and the Lie product, the binding of the associative product is stronger 
than the Lie product, i.e., f ~i gh is interpreted as / "i {gh), and fg^has (fg) "i h. 

9.1.2 Remarks. Since Poisson algebras have two products, neither of which is assumed 
to be commutative, we reserve the notation [/, g] for the commutator 

[f,9] ■= fg-gf 



185 



186 



CHAPTER 9. MECHANICS IN POISSON ALGEBRAS 



with respect to the associative product. If [/, g] = we say that / and g commute. If 
/ ~i (yi = we say that / and g Lie commute. An element which commutes (Lie commutes) 
with every element in E is called central (Lie central). 

9.1.3 Example. We take C°°(M x M) where the associative product is given by ordinary 
multiplication of functions, and where the Lie product is given hy f ~i g — fpgq — fqgp. To 
see that the Leibniz condition is satisfied we write 

f^gh = fp{gh)q - fq{gh)p 

= fpgqh + fpghq - fqgph - fqghp 
= {f^g)h + g{f^h). 

Thus C°°(M X M) is a commutative Poisson algebra. 

9.1.4 Example. For a Euclidean space EI we consider the space Lin EI of continuous linear 
operators on H. The Lie product is given by 

f^g = ^[/'f] = -^ifg-gf) ■ 

We have 

f^gh = ^(^fgh-ghf^ 

= l{fgh-gfh + gfh-ghf) 

= l{[f,g]h + g[f,h]). 

Hence Lin EI is a non-commutative Poisson algebra. In particular, taking EI = C", we find 
that C"^" is a non-commutative Poisson algebra. 

9.1.5 Proposition. Let E be a Poisson algebra. Then 
and 

f^g^^ng^-\f^g) if [f^g,g]^0. 

Proof. We first take n — and calculate 

/ n 1 = / n (1 . 1) = (/ n 1)1 + 1(/ n 1) = 2(/ n 1) , 

from which it follows that / n 1 = 0. Let us therefore suppose that the proposition is true 
for all k with < A; < n, then for A; = n -|- 1 we have 

/ ^ {g""^') = (/ ^ g'')g + ^ g) = ng--\f n g)g + g-{f n g) = ng^if n g) . 

□ 
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9.1.6 Definition. A Poisson *-algebra is a Poisson algebra that as a Lie algebra is a Lie 
*-algebra (defined in Definition 8.6.1) satisfying the additional rule 

if 9)* = 9*r ■ 

Note the change of order in (fg)* — g*f*. 

9.1.7 Example. The commutative Poisson algebra C°°(M x M) is made into a Poisson 
*-algebra by defining 

We have {f*)p{p,q) = fp{p,q), and 

{fgYiP, q) = Mp, q) = 7M 9M = (/V)(p, q), 

hence (fg)* = f*g* = g*f* since the algebra is commutative. Prom these considerations it 
follows immediately that C°° (R x R) is a Poisson *-algebra. 

9.1.8 Example. We make Lin EI with the quantum Lie product (8.21) into a non-commutative 
Poisson *-algebra by defining A* to be the adjoint conjugate transpose of A, which is defined 

as the linear operator A* such that 

(^0, ip) = (0, A*iP) , for all 0, e H , 

where (•, •) denotes the inner product on H. In particular, if H = C" then (0, ■0) — 4'*'^ 
and A* is the conjugate transpose of the matrix A e C"^". For general H, e have 

{AB(l),i)) = {B(}),A*ij) = {(}),B*A*i)) , 

from which we read off that {AB)* — B*A*. Then it follows that 

A^B = (^[A,B]y = -U{ABy - (BA)*) = -Ub*A* - A*B*) 

\ ft ' lb lb 

= UA*B*-B*A*)^^[A*,B*]^A*nB*. 

It lb 

Hence Lin EI is a Poisson *-algebra. 

9.2 Rotating rigid bodies 

The spinning top is the classical model of a spinning particle. Like a football, the top can 
be slightly deformed but when the external force is released it jumps quickly back to its 
equihbrium state. Molecular versions of a football are the fuUerenes, the most football-hke 
fuUerene being a molecule with 60 carbon atoms arranged in precisely the same manner as 
the vertices that can be seen in the corners between the patches on the surface of an official 
football. In a reasonable approximation, the deformability can be neglected; the spinning 
top, and also the fuUerene football, is most often treated as a rigid body. 
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The spinning top is treated in most undergraduate courses in mechanics; hence there is a 
rich hterature on the topic. ^ Due to the abundance of classical treatments of the spinning 
top we pursue here a nonstandard approach based on Poisson algebras. 

A rigid body can be moving as a whole, that is, its center of mass can have a nonzero velocity, 
but changing to comoving coordinates via a time- dependent translation, one may assume 
that the center of mass is not moving. The coordinate system in which the center of mass 
of the rigid body is fixed is in physics literature called the center of mass coordinate 
system. Without loss of generality we then assume the center of mass is at the origin 
(0,0,0). 

Having fixed the center of mass the rigid body can still rotate, but after rotating the 
coordinate system to the body-fixed one, no freedom is left. This means that the pose of 
a rigid body with fixed center of mass is completely described by a rotation Q{t) e SO (3). 

Thus Q{t) satisfies Q{t)Q{tY = Q{t)^Q{t) = 1 and det Q{t) = 1. Differentiating we get 

Q{t)Q{tf + Q{t)Q{tf = . 
Calling Q{t) = Q{t)Q{tY = Q{t)Q{t)-^ we thus have 

n{tf = -n{t) , 

that is Q is antisymmetric. We can therefore parameterize Q as 

(0 -LUs UJ2 

UJ2, —UJi 

—U2 OJi 

We then have Vtv = a; x v, where uj is the vector (a;i, a;2, a;3)"^. We view Vt{t) as a matrix 
X{ijj{t)) depending on the vector a;(t), called the angular velocity. 

A rigid body in a conserved system has an energy that can depend on the position deter- 
mined by Q{t) and the velocity Q{i). Since Q = QQ~^ = flQ = X{u)Q, the energy thus 
depends on Q and cu: the Hamiltonian H = H{Q,u) is a function of Q and cu. 

For a freely rotating body, the Hamiltonian only depends on the kinetic energy and is 
quadratic in the angular velocity; 

and we can always take / symmetric, / = The 3 x 3-matrix 7, called the tensor 
of moments of inertia, or just inertia tensor, has the meaning of an angular mass 
matrix analogous to the mass matrix M given in Chapter 2 for the case of an oscillating 
particle, where the kinetic energy was given by i7 = |v^Mv. The reason why it is called 

^Good accounts of the standard approach can be found, e.g., in Arnold [12], Marion & THORNTON 
[163], or Goldstein [94]. 
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a tensor and not a matrix is because / is in fact a bilinear form.^ Under a coordinate 
change I does transform as a bilinear form and not as a matrix. Indeed, under the change 
of coordinates lu i— > Qu for some Q G SO (3), the Hamiltonian is invariant and thus / 
transforms as / i— > Q~^IQ~^, that is, by a congruence transformation. In contrast, a matrix 
A transforms as ^ i— > QAQ~^, which is a similarity transformation. By a coordinate change 
/ can be made diagonal, so that we may assume that 



1 ^ 



k=l 

The coefficients Ik are called the principal moments of inertia. To have a Hamiltonian 
that is bounded from below we require li > 0. In practice one has h > for all k — 1,2, 3; 
then / is invertible. 

In analogy to the linear momentum p = Mv — ^ for an oscillating particle with kinetic 
energy H — ^v'^Mv, we define the angulcir momentum J by 

dH 

J := — = . 

ocu 

We rewrite the Hamiltonian as a function of J; 

H = l^^I-'J , (9.2) 

in analogy to the formula H = |p^M~^p for the oscillating particle. We have 

f) M 

— = r'j = uj, (9.3) 



in analogy with -v = ^ = M ^p. 



9.3 Rotations and angular momentum 

In Section 1.6, we used the Jfe as generators of the rotations; they are basis elements of the 
Lie algebra L = so(3). The correspond to the angular momenta of a spinning particle (see 
Section 9.2). Thus there is a more physical interpretation; the Jk correspond to measurable 
quantities, the components of the angular momentum. We denote the observable that 
correspond to with the same symbol J^. Purely classical, the state of a rigid rotating 
body in its rest frame is defined by specifying a numerical value for J = (Ji, J2, J^)^ , called 
the angular momentum of the rigid body. The state determines uniquely the value of 
every classical observable /(J) (such as the angular velocity u = I~^J of a spinning top, 
where / is a constant matrix called the inertia tensor). The dynamics is determined by the 
equation J = J x a;. We thus see the angular momentum components Ji as functions on 
phase space. Therefore we want to study the polynomial algebra PolL generated by 1 



^The same holds for the mass matrix - but there the terminology has become traditional. 
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and the Jj and give this algebra the structure of a Poisson algebra. The recipe obtained 
will then be further generalized to cover arbitrary C°^-functions of J. 

Motivated by the so(3) structure we define a product "i recursively, starting with the 
commutation relations of so(3) with 1 adjoined, 

1 n = , Jfc n J; = ^ ekimJm ■ 

m 

With the abbreviation oi Ji + a2 J2 + O3J3 = a ■ J, this gives 

l"ia-J = 0, a-J-ib-J^{axb)-J. 

Having given the product on the generators of PolL, the product is completely determined 
by the Leibniz rule 

a ■ J/(J) n6.J = (a-Jn6. J)/(J) + a ■ J(/(J) n 6 ■ J) 

for / e PolL. 

9.3.1 Lemma. We have the identity 

df{J) 



/(J) n 6 . J = (6 X J) 

where J as a vector in (PolL)^ means ( Ji, J2, J3). 



dJ ' 



Proof. The proof is by induction. For degree of / zero the statement is trivial. For degree 
1 we have ^ ^ 

a • J "1 6 • J = (a X 6) • J = (6 X J) • a = (6 X J) • . 

oJ 

Here we use a vector notation, that is, we consider PolL. Now suppose the statement is 
true for some n > 1, then we consider next a homogeneous polynomial of degree n + 1 and 
write it as a • J/(J) (or a linear sum of such). Next we consider on the one hand 

a ■ J/(J) n 6 . J = (a X 6) • J/(J) + a ■ Jb x J ■ , 
and on the other hand 

(6x J)-— (a-J/(J)) = (6x J)-a/(J) + a-J(6x J)-— /(J) 

= (ax6)-J/(J) + a-J(6x J)-|^/(J), 
and by inspection the two expressions are the same. □ 



9.3.2 Lemma. The product n satisfies 

/(--)-w-(|x--)-l-(i^l)- (-) 
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Proof. We again proceed by induction, this time on the degree of g. For degree < 1 of 5^, 
the previous lemma gives the result. Now suppose the result holds for polynomials up to 
degree n > 1. Now consider the polynomials of degree n + 1 and write such a polynomial 
as a sum of terms g{J)h{J) where g and h both have degree < n. Then for each such term 
we have 

/(J) n (^(J)/i(J)) = (/(J)n^(J))/,(J) + <^(J)(/(J)n/,(J)) 

. (|x.).|m.)..(.)(|x.).| 

f djgh) \ df _ fdf d{gh) \ 

□ 



Note that (9.4) makes sense for arbitrary C°° functions of J, although it was derived only 
for polynomials. 

9.3.3 Proposition. The algebra E = C°°(R^) and its subalgebra Pol(L) are Poisson al- 
gebras. That is, the product (9.4) is a Lie product satisfying the Leibniz identity. 

Proof. The antisymmetry of the product "i is obvious on the generators, for the other cases 
we use Lemma 9.3.1 and Lemma 9.3.2 together with the observation that u-3 xw = w-ux 
J = —w ■ J X u. The Leibniz identity is a direct consequence of the product rule for partial 
derivatives. The Jacobi identity is a bit tedious to check. Using the notation fk — -§j^ and 
the Levi-Civita symbol, one writes the outer product for vectors as {uxv)k — 5^;^ ^kimUiVm- 
Then we find for the Lie product 

/ "I 5' = ^ ^kimJidmfdkg , 

klm 

from which the antisymmetry follows immediately. Using the identity 

^ ^ ^klm^mnp ^kn^lp ^kp^ln i 

m 

one obtains after some algebric calculations 

if ~^ 9) ~^ h — hkfkJk9l ~ hk9kJlfl + ^klm^abcJmJchl{fa9bk + fak9b) , 

where the summations are over all present indices. When summing over the cyclic permu- 
tations of /, g and h the first summation is easily seen to give zero. We write the second 
sum as 

^klm^abcJmJc{jagbkhl + fak9bhl + 9ahbkfl + 9akhbfl + hafbk9l + hakfb9lj i 

and focus on the term with two derivatives on / 

^klm^abcJmJc{jak9bhl + hafbk9l^ — ^klm^abcJmJc{jak9bhl + hkfla9bj 

— ^klm^abcJmJc{jak9bhl — hifka9bj 

= 0. 
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The other terms cancel similarly. □ 



9.4 Classical rigid body dynamics 

Many books on classical mechanics, see for example Marion and Thornton [163], 
Arnold [12] or Goldstein [94], present the standard approach to the dynamics of a 
spinning rigid body, resulting in the Euler equations for the spinning top. We take an 
alternative route. We write down the Lie product that determines the mechanics. We then 
derive the Euler equations and reproduce the same equations of motion. Thus we are giving 
an equivalent description. 

The motivation for the form of the Lie product is determined by symmetry considerations. 
We have seen that the algebra of infinitesimal rotations - which must be involved in the 
differential equations describing the state of the spinning object - is so(3), the Lie algebra 
of real, antisymmetric 3 x 3-matrices. In Section 9.5, we have seen that we can obtain a 
Lie-Poisson algebra out of any Lie algebra and in Example 9.5.3 we have constructed the 
Lie-Poisson algebra of so{3). Since the dynamical observables of a physical system form 
a Poisson algebra, we consider the Lie-Poisson algebra of arbitrarily often diffcrentiable 
functions on M.^, with coordinates Ji, J2 and J3, equipped with the Lie product given in 
Section 9.3 

for f,gE C°°{M.^). Now that wc have the Poisson algebra and the Hamiltonian (9.2) for the 
classical mechanics of the spinning top, we can apply the usual recipe. For an observable / 
the time-evolution is given by 

In particular, for the angular momentum we have from (9.3) 

4 = n = f J X — j ■ 4 = (J X u;)jfc , 

where is the unit vector in the direction k, and where we use dH/dJ — I~^J — cu. We 
thus have 

J = J X c<j . 

Further, since J = luu we find lu = lu x u. Writing this out in components we find 

(UJ2U}z{h - h) \ 
Ljiusih-h) (9.5) 
(jJi(jJ2{h - h) J 

The equations (9.5) are the Euler equations for the spinning top. The spinning direction 
is given by the vector n := c<j/|c<j| and the spinning speed is given by |a;|. Thus knowing the 
trajectory of a;(t) in the phase space at all times implies knowing everything about the 
direction and speed of the spinning motion. 
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We claim that = J ■ J is a Casimir of the Lie algebra so(3). Indeed, from (1.20) we have 
Ji^ J2 = J3 and the other commutation relations can be obtained by cyclic permutation. 
But then 

Ji ~i J = Jj^ -| J2 + t/i ~i t/3 = J3J2 + J2J3 ~ J2J3 ~ J3J2 = , 

and for the other generators the results are similar. Since is a Casimir of the Lie algebra, 
it is conserved by the dynamics. Indeed, calculating the time-derivative of we find 

J2 = 2J j = 2J J X a; = . 

Hence the motion preserves surfaces of constant J^, which are spheres. The radius of the 
sphere is determined by the initial conditions. 

Note that the phase space is not an ordinary (i.e., symplectic) phase space; it is not 
even-dimensional. However, since we have a Poisson algebra, it is a Poisson manifold as 
described in Section 12.1. In the present case, the symplectic leaves (co-adjoint orbits) 
are the surfaces where the Casimir has a constant value; hence they are the spheres on 
which the motion takes place. Indeed, 2-dimensional spheres in M.^ have a natural symplec- 
tic manifold structure, on which the rotation group SO {3) acts as a group of symplectic 
transformations . 

Since the Hamiltonian is conserved (although for completely different reasons), the motion 
also preserves surfaces of constant E — iJ^7~^J, which are eUipsoids. If / is not a multiple 
of the identity, this forces the motion to be on curves of constant J^I~^J on the co-adjoint 
orbit, i.e., on the intersection of the sphere defining the co-adjoint orbit with the ellipsoid 
J^/~^J = 2E, where E is again determined by the initial conditions. Then only the speed 
along these curves needs to be determined to specify the motion. Thus the free spinning 
rigid body motion is exactly solvable. 

Let us consider affine functions on the Poisson algebra of the classical spinning top. We 
calculate (for a, 6 e and a, P E C) 

(a + a- 3) ^ ((3 + b- 3) ^ (ax 3) - b ^ (b X a) ■ 3 , 

which describes the Lie algebra u{2). Looking only at linear functions, that is, the linear 
subspace spanned by J, we find the Lie algebra so(3). The two Lie algebras only differ by 
the center of u{2) and thus u{2) = su{2) ® R = R ® so(3). This coincidence is due to the 
sporadic isomorphism so(3) = su{2). 



9.5 Lie— Poisson algebras 

In the above section we started from a the Lie algebra structure of so(3) to construct an 
associated Poisson algebra. This program can be repeated for arbitrary real Lie algebras. 

The formulation closest to the physical applications is in terms of a Lie *-algebra L. It 
applies to arbitrary real Lie algebras such as so (3) by taking their complexification and 
adding, if necessary, a central element 1, thus extending the dimension of the Lie algebra 
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by one. As usual, wc write C for the complex linear subspace spanned by the element 1. In 
case that L is infinite-dimensional, we assume L to be equipped with a topology in which 
all operations are continuous and that L is reflexive (see below); in flnite dimensions this 
is automatic. 

We consider the dual space L* of continuous linear maps from L to C, and the bidual space 
L** of continuous linear maps from the dual space L* to C. For flnite-dimensional vector 
spaces we have canonically L** = L, for infinite-dimensional vector spaces in general only 
L C L**; in both cases we have an injective map L — > L** given by 

A normed vector space is called refiexive if L** = L. We need L to be refiexive for the 
construction that follows. We thus assume L** = L in the following. 

For any real number A (we shall need A = and A = 1), we define the family of parallel 
affine hyperplanes 

M, := {e e L* I an = air for all / e L, ^(1) = A} . 

One should note that Mq is a real linear subspace in L*. The affine hyperplane Mi carries 
the structure of a real submanifold, with the tangent space at each point being isomorphic 

to Mq. 

If L is the complexification of a real Lie algebra L', so that we have L = L' (8)r C, then 
the elements of Mq are the linear functional ^ on L' that are zero on the element 1, and 
are extended to hnear forms on L by linearity: ^(a + bi) — ^a) + i^b) for a, 6 e L'. So 
wc can identify Mq in this case with the dual of the quotient Lie algebra L'/M, where M 
denotes the real subspace spanned by the distinguished central element 1. Therefore the 
dual of Mo is again L'/M. In the general case Mq is a real subspace in (L/C)*, so that 
MoC = Mo + iMo satisfies (MqC)* ^ L/C. 

We consider for a non-empty open subset M of Mi the commutative algebra E = C°°{M). 
We define for every / € E and ^ e M a linear map df{^) : Mq — >• C by 

rf/(0^^ = Jim ^^^^^tMll/ii) foraUi;eMo. 

So we have df{^) e Lin(Mo, C). Extending by C-linearity we can view df{^) as an element 
of Lin(MoC, C). Hence (i/(0 defines an element in (MqC)* ^ L/C. We can find an element 
Df{^) in L such that under the projection L — > L/C the clement Df{^) goes to df{^). The 
choice of -D/(0 is not unique, but another choice D'f{^) differs from Df{^) by an element 
in C, which is contained in the center. 

We now show how the object -D/(^) can be chosen. We choose an arbitrary element 
C(j G L* with uj{l) = 1. Then we can write L* as a direct sum L* = MqC © W (as a 
complex vector space), where W = Co; := {auj | a G C} is the 1-dimensional span of uj. 
Indeed, for an arbitrary element of ^ G L*, the element C '■= ^ ~ C(l)'^ satisfies ^'(1) ~ 0. 
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Now ^' can be written as a linear combination u + iv of two elements u, v G Mq. Thus 
^ — u + iv + ^(1)<^ e MqC © VF. For any fixed choice of u we define Df{^) by 



Note that u — u{l) G MqC. The extended -D/(C) lies thus in L**. But L was assumed to 
be reflexive, hence we have Df{^) G L. 

We are now in a position to define a Lie product on E by 



where the Lie product on the right-hand side is that of L. The left-hand side above is 
the complex number obtained by evaluating the function h := f ~^ g for the argument 
^ G M C Ml C L*. The right-hand side is the complex number obtained from the bilinear 
pairing between df{^) n dg{^) G L and the same Since the derivative of a smooth function 
is again smooth, f ~i g is again an element of E. 

We see that the Lie product f ~' g is independent of the choice of Df(^) and Dg{^), or 
equivalently, of the choice of u in (9.6). Indeed, any other choice would differ only by 
an element in the center. But taking the Lie product in L the dependence of the central 
element drops out. 

We have the following theorem: 

9.5.1 Theorem. The algebra E with the Lie product ~i defined above is a Poisson algebra, 
called the Lie-Poisson algebra over L. The restriction of the Lie product of E to affine 
functions coincides with the Lie product of L. 

Proof. (Sketch): The definition of n is independent of uo. The antisymmetry of "i is 
clear, and the Jacobi identity follows from that of L, using the fact that partial derivatives 
commute. The Leibniz identity follows from the Leibniz property of differentiation. The 
injection L — > L** gives a map from the Lie algebra to the affine functions. We therefore 
regard the Lie algebra as a subalgcbra of the affine functions. Since wc assumed the Lie 
algebra L to be reflexive the afiine functions represent elements of the Lie algebra. Indeed, 
for an afiine function / we obtain a linear function by subtracting /(O) and thus defines an 
element /'of L. But /(O) is a multiple of 1 and thus also an element of the Lie algebra. 



We give two important examples. 

9.5.2 Example. Consider L = h{l), the Heisenberg algebra, which is spanned by genera- 
tors p, q and 1, with p "i g = 1 and all other Lie products between the generators vanishing. 
We identify the dual L* with C'^ as follows. 



Df{i)u ■.^df{i){u-u{l)uj). 



(9.6) 



(/ n g){i) = {Df{i) n Dg{im) for all^ e M C 



therefore f ^ f + /(O) G L. 



n 
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for any choice a, (3, and 7 in C. The affine hyperplane Mq is in this case given by 



Mr 



and similarly, for Mi we find 



Mi = | 



y 



X 

y 



x,y e 



x,y e 



Fordf{0,^={x,y,l] 



If / is a smooth function on Mi, it is a smooth function 
and V = (a, b, 0)^ we find 

,fU\ 9f{x,y) df{x,y) 
The simplest choice for Df{^) corresponding to writing h{l) = Cp ® Cgr ® C, is 



If g is another smooth function 



dx 

we have 



dy 



DfiO - DgiO = 9f(-'y)99{x,y) _ df{x,y)dg{x,y) ^ ^ ^ ^^^^ 



and thus 



dx dy dy dx 

df{x,y)dg{x,y) df{x,y)dg{x,y) 



dx dy dy dx 

which precisely corresponds to the Lie product associated to the dynamics of a single particle 
in one dimension. 

More generally, an arbitrary Heisenberg algebra leads to general symplectic Poisson algebras 
on convenient vector spaces. 

9.5.3 Example. We now show that for the choice so(3) we recover the Lie product (9.4). 
We identify the real Lie algebra so(3) with M.^ equipped with the vector product. We adjoin 
a central element to obtain so(3) © 1 and call L the complexification of so{3) © 1. We write 
an element of L as {v,a) where v & and a e C, so that the Lie product is given by 

{v, a) "1 {w, b) = {v X w, 0) . 

Of course, v x w is defined by extending the vector product on by C- linearity. We 
identify L* with C"^ as follows 



/x\ 

y 

z 



Vl 

{v, a) = xvi + yv2 + zv3 + ta, v ^ \ V2 

V3 
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Thus we find that Mx consists of the vectors {x, y, z, A)^ with x, y and z real numbers. A 
smooth function on Mi is just a smooth function M. For any smooth / : ^ M, 

and ^ = {x, y, z, 1) we define 



where we identify the vector ^ = {x, y, z, 1) in Mi with the vector {x, y, z) in R^. We see 
that we can choose Df{^) — (V/(^), 0) and the Lie product on E is then given by 



which is precisely (9.4); (x, y, z)'^ corresponds to ( Ji, J2, Js)^. 

The attentive reader might have noticed that in Example 9.5.3, the central clement 1 played 
no role at all. As mentioned before, when a Lie algebra has no distinguished central element 
one can always add one. However, in this case one can also proceed directly as follows. For 
a real Lie algebra L, we consider the dual L* and the algebra E of real-valued smooth 
functions on L*. Let / G E and ^ e L*. The 1-form df{^) is an element of the dual of the 
tangent space at ^. Since L* is a vector space and L is assumed to be reflexive, the dual 
of the tangent space at ^ is again L. Hence df{^) defines an element of the Lie algebra, 
which we also denote by df{^). Then we define the Lie product on E for /, g( e E as follows 
(/ ^ = ^{df{^) n dg{^)), that is, to get (/ "i g){i) the function ^ is evaluated at the Lie 
algebra element df{^) ~i dg{^). We leave it as an exercise that this gives the same result for 
real Lie algebras that do not have a distinguished central element. 

It turns out that the majority of commutative Poisson algebras relevant in physics are 
Lie-Poisson algebras constructible from a suitable Lie algebra, or natural quotients of such 
algebras. In particular, this holds for the Poisson algebra of classical symplectic geometry 
in M^, which come from general Heisenberg algebras, and for all but one of the Poisson 
algebras for nonequilibrium thermodynamics constructed in Beris and Edwards [29]. 



9.6 Classical symplectic mechanics 

A conservative physical system is completely characterized by three main ingredients: 
the kinematical algebra, the Hamiltonian, and the state. The kinematical algebra of the 
system is a Lie *-algebra L which defines the kinematics, i.e., the structure of the quantities 
whose knowledge determines the system. The Hamiltonian H defines the dynamics. It 
is a Hermitian quantity in an associative algebra E carrying a particular representation 
of the kinematical algebra, a Poisson representation in the classical case, and a unitary 
representation in the quantum case. The state encodes all properties of the physical state 
of an individual realization of the system at a fixed time. 

The kinematical algebra determines the kinematical symmetries of a whole class of systems 
which differ in Hamiltonian and state. This means that applying a transformation of the 



v/(0 = ( 



dm dm dfiO Y 



dx ^ dy ^ dz ) 



f ^ giO = e • (v/(0 X v^(0) , 
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corresponding symmetry group transforms a system of this class into another system of 
the same class, usually with a different Hamiltonian. Those (often few) symmetries which 
preserve a given Hamiltonian are called symmetries of the system; applying a symmetry 
of the system changes possible state space trajectories of the system into other possible 
trajectories, usually affecting the states. Those (even fewer) symmetries which preserve the 
Hamiltonian and the state are symmetries of the particular realization of the system, and 
hence directly measurable. 

The kinematical algebra may admit (up to isomorphism) one or many Poisson representa- 
tions for classical systems, and one or many unitary representations for the corresponding 
quantum systems. For example, a Heisenberg algebra with finitely many degrees of freedom 
admits only one unitary representation, which is the content of the Stone- Von Neumann 
theorem. 

In the nonrelativistic case, the Hamiltonian is an element of the Poisson algebra for classical 
systems, and for quantum systems the Hamiltonian is an element of the universal enveloping 
algebra of the Lie *-algebra. 

Let E be the algebra determined by the physical system, that is, either E is the Lie-Poisson 
algebra of the classical system, or E = LinH for a Euclidean space EI whose closure is a 
Hilbert space. Both the Lie algebra L and the space-time symmetry group are represented 
inside E. 

We now consider the special case of classical iV-particle systems describing the motions of 
a molecule. ] The algebra E consists of the complex-valued functions on phase space M 
and each point z & M in phase space determines a state {■)z by 



called a cleissical pure state. Note that evaluation at a point z E M is more than a 
hnear functional; an evaluation gives an algebra homomorphism C°°(M;R) — > R since 
{fg){z) = f{z)g{z); hence we have a character of the commutative algebra E. If the phase 
space M is an open subset of MJ\ the evaluations are the only characters of E. This can 
be seen as follows. Take any algebra homomorphism ip : C°°(M) —>■ M. Let xi, . . . ,Xn be 
coordinates on M and denote by Oj = (p{xi) the images of the coordinate functions. The 
homomorphism (p thus determines a point z — {ai, . . . , a„) in R". We have to show z e M. 
Suppose z ^ M, then 



is a function that does not vanish on M, and thus is an invertible element of C°°(M). If an 
element x is invertible, then so is its image under any homomorphism. Indeed, if xy = 1, 
then (p{xy) = ip{x)ip{y) = ip{l) = 1. But the function / is mapped to zero under (p and 
hence cannot be invertible. Hence we arrive at a contradiction and the assumption z ^ M 
is false. 

A mixed classical state is a weighted mixture of pure classical states. That is, there is a 
real- valued function p on the phase space M, called the density, taking nonnegative values 



(/). 
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and integrating to one 

/ p{z)dn{z) = 1 , 

JM 

such that 

(/) = / p{z)f{z)di,{z) . (9.7) 

The integration measure dp depends on the apphcation. In symplectic mechanics, the 
symplectic Poisson bracket determines the Lie product, one uses the Liouville measure, 
defined in local coordinates {q,p) by 

dp{z) = dgi • • • dqndpi ■ ■ ■ dpn- 

Consider a system containing N particles. Then each particle has a momentum and a 
position. Hence phase space is GA/^-dimensional. The Lie algebra is given by the relations 
Via "I Qjb = ^ij^ab where is the ath component of the momentum of the ith particle and 
Qjb is the 6th component of the position of the jth particle. The obtained Lie algebra is the 
Heisenberg algebra h{N). 

In molecular mechanics, the Hamiltonian is of the simple form^ 

N 2 

i=i 

where the potential F (qi, . . . , q^v) describes the potential energy of the configuration with 
positions (qi, . . . , qAr). 

The states in symplectic mechanics are precisely the states of the form (9.7). If the 
system is such that we can measure at one instant of time all positions and momenta 
exactly (obviously an idealization), the configuration is precisely given by the point z = 
(qi, . . . , qAT, pi, . . . , Pat) in phase space, and (/) = f{z) for all / G E. Thus the density 
degenerates to a product of delta functions of each phase space coordinate. Thus classical 
pure states are equivalent to points z in phase space, marking position and momentum of 
each point of interest, such as the centers of mass of the stars, planets, and moons making 
up a celestial system. 



9.7 Molecular mechanics 

Consider a molecule consisting of N atoms. The molecule is chemically described by assign- 
ing bonds between certain pairs of atoms, reflecting the presence of chemical forces that 

^The more general form H = ^ J2i'j=i Gij(qi, • • • , qAr)PiPj +l^(qi, • • • , (In), where G is a configuration- 
dependent inverse mass matrix, appears at various places in physics. When the potential V is constant 
(so that we can put it to zero), the physical system is sometimes called a cr- model. Such models play an 
important role in modern high-energy physics and cosmology. Some authors prefer to include a potential 
into the definition of a cr-model. 
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Figure 9.1: Bond vectors, bond angles, and the dihedral angle 



- in the absence of chemical reactions which may break bonds - hold these atoms close 
together. Thus a molecule may be thought of as a graph embedded in 3-dimensional space, 
in which some but usually not all atoms are connected by a bond. The chemical structure 
of the molecule is thus described by a connected graph, the formula of the molecule. (In 
the following, we ignore multiple bonds, which are just a way to indincate stronger binding 
than for single bonds, reflected in the interaction potential.) We write i ~ j if there is 
a bond between atom i and atom j and similarly we write i ~ j ~ A; if there is a bond 
connecting i and j and there is a bond connecting j and k. The notation is extended to 
longer chains: i r-> j k I 

The interactions between the atoms in a molecule are primarily through the bonds, and to 
a much smaller extent through forces described by a pair potential and through multibody 
forces for joint influences of several adjacent bonds. 

The geometry is captured mathematically by assigning to the jth atom a 3-dimensional 
coordinate vector 



specifying the position of the atom in space. If two atoms with labels j and k arc joined by 
a chemical bond, we consider the corresponding bond vector qj — qk, with bond length 
Iki ~ • T^oom temperature, the bonds between adjacent atoms i and j are quite rigid, 
meaning that the deviation from the average distance r^j is generally small and the force 
that tries to maintain the atoms at distance rij is strong. In chemistry this is modeled by 
a term 



in the Hamiltonian, where the aij are stiffness constants, parameters determined by the 
particular chemical structure. 

Consider two adjacent bonds i ~ j and A; ~ /. The bond angle a is the angle between the 
bond vectors qj — qi and qi — qk- The bond angle a can then be computed from the formulas 





cos a = 



hi - QjWhk - QjW 



sm a — 




Qi - Qjlmk - Qj 
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and is thus invariant under the simultaneous action of the groTip IS0{3) on all 3 vectors. 
In most molecules the bond angles are determined from the interaction between the atoms 
in the molecule. There is thus an /5'0(3)-invariant term 

Kngle(?l, . . . , giv) = 5^ aijk^Qi, Qj, Qk) 

in the potential with $ : (M^)^ — > R an /5'0(3)-invariant function, and Uijk are some 
parameters. 

Finally, the dihedral angle uj = ^{i ^ j ^ k ^ I) (or the complementary torsion angle 
27r — uj) measures the relative orientation of two adjacent angles in a chain i r-> j r.-> k I 
of atoms. It is defined as the angle between the normals through the planes determined by 
the atoms i,j,k and j, k, I, respectively, and can be calculated from 

, , _ i'ii - 5i) X ilk - Qj) ■ [qj - Qk) X {qi - qk) 



\\{qi - qj) X (qk - qMWiqj - qk) x {qi - qk)\\ ' 

and 



sma; 



i<l< - 'Ij) X i'Ji - 'U) ■ i'lk - \ - <lj\ 



\\{qi - qj) X {qk - qMMij - ^ik) x {qi - qk)\\ ' 

Again, the angle between the planes is /5'0(3)-invariant and therefore described by an 
/5'0(3)-invariant function ^ : (E?)^ — > R of the positions of the four atoms. Hence to 
model the molecule there is a term 

Vdihedrai(?i, ■ ■ ■ , ?Ar) = aijki^{qi^ qj , qk, qi) 

i^j^k^l 



in the Hamiltonian, with again Uijki parameters. The total Hamiltonian is then taken to be 

2 



The above Hamiltonian is of a special type; it is a member of the family of Hamiltonians 
of the form 



H-j:ir + nqi,---,qN) 



2m. 

This family of Hamiltonians is favorable since there are no mixed terms between the mo- 
menta and the positions. Therefore, in the quantum theory there is no ambiguity in how the 
quantum mechanical Hamiltonian has to be written, since the momenta commute among 
themselves and the positions commute among themselves, too. 

The group I SO (3) plays an important role here purely on symmetry grounds; how and 
where a molecule is located in R^ does not determine the chemical properties. Hence the 
Hamiltonian should depend on 75'0(3)-invariant quantities only. 

From the above we see that the one in practice is given a representation j : G — > GL{V) of 
some group in a vector space. The construction of a suitable Hamiltonian can be facilitated 
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by knowing the invariants in the tensor representations V ^V, V 0V iSiV and so on. If V 
is irreducible, V might contain one or more one-dimensional subrepresentations on which 
G acts trivially; these are precisely the invariants in V. Hence knowing the irreducible 
representations of G is of great importance. 

Now suppose that G has an irreducible representation j on V. Then g E G acts on V 
as follows: g : v ® w ^ j{g)v ® j{g)w, and similarly for higher order tensor products. 
It is almost never the case that the representation of G in tensor products of irreducible 
representations is again irreducible. But, in many cases, the decomposition oi V (S> V, 
V (S>V (S>V, etc. into irreducible representations is known. 

In the case of ISO{S) the representation is in R^, which contains no invariants since all 
points of form a single orbit. But, as we have shown in Section 9.7, V <SiV, V <SiV <SiV 
and V^V^V^Vdo have invariants: distances and angles. 



9.8 An outlook to quantum field theory 

Quantum field theory is the area in physics where fields arc treated by quantum mechanics. 
The way physicists think of this is more or less as follows. As we have seen in Chapter 2, 
classical hnear field equations, such as the Maxwell equations, can be seen as describing 
a family of harmonic oscillators labeled by a continuum of pairs (p, s) of momenta p and 
spin or hclicity s. Therefore, what has been treated above is nice, but for quantum field 
theory it is not enough. One needs an infinite number of oscillators. Treating such a system 
becomes mathematically sophisticated, because topological details start playing a dominant 
role. A way to deal with this heuristically, often employed by physicists, is by discretizing 
space-time in a box. On each point of the lattice one places a harmonic oscillator; then 
there are just a finite number of oscillators. To get the quantum field theory, one considers 
the limit in which the size of the box goes to infinity and the spacing of the lattice goes 
to zero. Then the oscillators are not described by operators and a'^ that are labeled by 
vectors k, but by operators a{x) and a'^{x) that are labeled by the continuous four- vector 
index x. The limit might not exist. . . . 

In two space-time dimensions the limit is well-defined for interacting field theories, that 
is, for field theories where the different fields can interact. In case of four dimensions, 
the correct limit is only known for non-interacting field theories. Prom experience we know 
that there is interaction, of course, so our description shows scrioTis shortcomings. After the 
preceding description of representations, it is interesting to note that,- in the field theory 
limits, the metaplectic representations still exist. 

For 2-dimensional field theories with one space and one time dimension, this leads to sat- 
isfactory quantum field theories (such as conformal field theory) . But for 4-dimensional 
field theories, the metaplectic representation is restricted to a class of operators not fiexible 
enough for capturing the physics. This is the main mathematical obstacle for formulating 
a consistent framework for 4-dimensional quantum field theories. 



Chapter 10 



Representation and classification 



10.1 Poisson representations 

Consider the Heisenberg algebra h{n) with the usual generators 1, pi, and qi, and the 
corresponding Lie-Poisson algebra E(/i(n)). The subalgebra of all polynomials in qi,Pj is 
closed under the Lie product, and hence a Poisson subalgebra. More interestingly, there are 
several Lie subalgebras of low degree polynomials, which we shall now explore. We write z 
for the 2n-tuple 



of all the generators except 1. All linear polynomials without constant term can be written 
as a ■ z for some a G C^". On C^" we introduce the antisymmetric bilinear form cu on C^, 
represented in the given basis by the matrix J: 



where the entries in J are n x n- matrices, i.e., 1 — 1^, etc.. The bilinear form u is 
nondegenerate and antisymmetric, and we have 



Any quadratic expression in E(/i(n)) is a linear sum of expressions of the form 

(a ■ z){b ■ z) . 

We consider two such expressions and calculate their Lie product 

{a ■ z){b ■ z) ~i {c ■ z){d ■ z) — a ■ z{{b ■ z) ~i {c ■ z) {d ■ z)) + {a ■ z ~i {c ■ z) {d ■ z))b • z 

— {b • z){c- z)u!{a, d) + {b • z){d • z)u!{a, c) 
+(a • z){c • z)u!{b, d) + {a • z){d • z)u!{b, c) , 

which is a quadratic expression. Hence the homogeneous quadratic polynomials form a Lie 
subalgebra of E(/i(n)). We show below that this Lie algebra is related to sp(2n, C). We 





a • z ~\ b ■ z — u!{a, b) . 



(10.1) 
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proceed in physicist's fashion by looking at a conveniently chosen basis. In Section 15.4 
we give a second derivation in a coordinate independent fashion, which generalizes to the 
fermionic case and gives Lie algebras related to the real orthogonal groups. 

The generators of h{n) are 1, pi and g^. Consider the elements 

Qij = QiQj , Pij = PiPj , Eij = ^ {qiPj + pjQi) , 

of the universal enveloping algebra. We have Qij — Qji and Pij = Pji. We find the 
commutation relations: 

, Pij ^Pki = 0, 
—SiiEkj + SjkEii , 
5jiQik + SjkQa J 

—SilPjk — ^ikPjl , 

—SikEji — SjkEii — SjiEik — SiiEjk . 

The Lie algebra sp{2n, C) is given by the complex 2n x 2n-matrices that preserve the above 
given J: 

X e sp{2n, C) ^ X^J + JX = 0. 

Taking X in block form as 

^={c d) ' 

we find X e sp{2n, C) if and only if C = C^, B ^ B'^ and D = -A^. If we introduce the 
n X n-matrices Cij that are 1 on the ij-entry and zero elsewhere, we have e^jCki — SjkCu and 
the matrices 

. _ fcij \ R _ /^O + ejA r - f ° 

- [ -CjiJ ' ""^^ - 1^0 ' - [cij + Cji J ' 

form a basis for sp{n, C). We find the commutation rules 

, Cij ~i Cki = , 
—SiiAkj + SjkAii , 
SjiBik + ^jkBii , 
~SiiCjk — SikCji , 
^ikEji + SjkEii + SjiEik + SiiEjk . 

Sending Qij to —B^j, Pij to Cij and E^j to Aij we have an isomorphism between the algebras. 

We now allow for inhomogeneous quadratic polynomials by adjoining the linear forms of 
the algebra E(/i(n)) to this Lie algebra. Everything commutes with the central element 1, 
so we will not write down the commutation relations with 1. The commutation relations 
of the other basis elements are found to be 

Qij ^Qk = Pij ^ Pfc = , 



Qij "I Qki = 

Eij "1 Eki = 

Eij "1 Qki = 

Eij ^ Pkl — 

Qij "I Pkl = 



B, 


~i 


Bki 


Aij 


n 


Aki 


Eij 


n 


Bki 


Aij 


n 


Cki 


Bij 


n 


Cki 
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Qij Pk — —SikQj — SjkQi , 
Pij ^Qk ^ -SikPj - SjkPi , 

Eij -^Pk = -SjkPi ■ 

We define the Lie subalgebra L' of E(/i(n)) as the Lie subalgebra of quadratic expressions 
in the generators and we define L = L'/C, so that in L we have Qi ~t pj — 0. Using 
the previously estabhshed isomorphism with sp{2n, C) it is not too hard to see that L is 
isomorphic to the Lie algebra isp{2n,C), which is defined as the Lie algebra of all {2n + 
1) X (2n + l)-matrices of the form 

A r' 


with A a,2n X 2n-matrix in sp{2n, C) and r a 2n-vector. We have thus shown that L' is a 
central extension of isp{2n,<C). 



10.2 Linear representations 

Of great interest in quantum mechanics are certain realizations of Lie algebras and of Lie 
groups by means of operators on vector spaces. We therefore address the concept of a 
representation of a Lie algebra. In the previous chapter we have already given a short 
discussion of finite-dimensional representations of finite- dimensional Lie algebras. 

10.2.1 Definition. 

(i) A (linear) representation of a Lie algebra L in an associative algebra E is a linear 
map J : L ^ E such that 

J{f ^ g) = J{f)J{g) - J{gW) for all /, ^ e l. 

The representation is called faithful if J is injective. A linear representation on a (finite- 
or infinite-dimensional) vector space H is a representation in the algebra E = Lin H. In the 
case that E is the algebra oi n x n matrices with entries in K one obtains the definition of 
Section 10.3. A linear representation is called irreducible when the only subspaces closed 
under multiplication by linear mappings of the form J(/) are and H. 

(ii) A uniteiry representation of a Lie *-algebra L is a hnear map J : L — > E in the 

*-algebra E = Lin EI of continuous hnear operators of a Euclidean space H (with * being 
the adjoint), satisfying 

J(l) = l, J{n = J{f)\ J{fng) = ^[j{f)J{g)-J{g)J{f)). 

Note that by Proposition 8.2.3, an associative algebra E becomes in a natural way a Lie 
algebra by defining f ^ g = [fig] = fg — fg- Hence a representation of a Lie algebra L 
in an algebra E is a Lie algebra homomorphism from L to E, with E regarded as a Lie 
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algebra. If the representation is faithful, the image of L is a Lie subalgebra of E isomorphic 
to L. In this case, one often identifies the elements of L with their images, and then 
speaks of an embedding of L into E. By the Theorem of Ado mentioned above, every 
finite-dimensional real Lie algebra has a faithful representation. 

The enveloping algebra. In a representation, the elements of L are represented by 
matrices or linear operators. Prom a given set of matrices we can form the algebra that 
these matrices generate, containing the unit matrix, all finite products and their linear 
combinations. This motivates us to consider an object that already encompasses this algebra 
for all representations: the universal enveloping algebra of a Lie algebra L. In general it is 
constructed by considering the tensor algebra T(L), which is given by 

oo 

T(L) = K e L e (L L) ® (L L L) e . . . = 

1=0 

One makes T'(L) into an associative noncommutative algebra over the complex numbers by 
defining the product ab to be the tensor product a<S>b. 

Within T(L) we consider the ideal J generated by all elements of the form 

X <^ y — y <^ X — [x,y] 

for all X, y in L. Thus an element in ^7" is a sum of elements of the form 

a®{x®y — y®x — [x,y]) ®h ^ 

for some a,b E T(L). The universal enveloping algebra of L is then defined as the 
associative noncommutative algebra U{L,) over the complex numbers given by 

w(L) = r(L)/j. 

Another view on the universal enveloping algebra 14 (L) would be as follows. One chooses 
a basis {ti} for L and considers the associative noncommutative polynomial algebra in the 
generators while imposing the relation 

Thus we consider the associative algebra generated by 1 and by the generators of L and 
impose the Lie product, which in this case is the commutator, by hand. The algebra we 
obtain in this way is canonically isomorphic to the universal enveloping algebra U (L) . 

The universal enveloping algebra thus contains the Lie algebra, i.e. envelopes the Lie 
algebra. This approach is very practical and therefore often used by physicists. There 
exists a more sophisticated definition, using a so-called universal property. One then proves 
that such an object is unique and that the given definition above has this universal property. 
We do not expand on the definition using the universal property but refer to the literature, 
see, e.g., Jacobsen [121], Knapp [137], or FuCHS & Schweigert [84]. It is because of 
this universal property that W(L) is usually called the universal enveloping algebra, and 
not just the enveloping algebra. 
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The main reason to define the universal enveloping algebra is to study the representations of 
the Lie algebra. Every representation of the Lie algebra induces a unique representation of 
the associative universal enveloping algebra, and conversely, every representation of the uni- 
versal enveloping algebra induces a representation of the Lie algebra itself. In a sense, all 
finite-dimensional representations are maps of the associative universal enveloping algebra 
to the associative algebra of n x n-matrices for some n. 

Ccisimir elements. An element C G U{L,) in the center of the universal enveloping algebra, 
i.e. that commutes with all other elements of W(L), is called a Casimir element, or just 
Casimir and sometimes also Casimir operator. If L has a representation in a vector space 
V, then for any c G K the subspace Vc = {v E V\Cv = cv} is invariant under the action of 
L, precisely because C is in the center of W(L). Hence if the representation V is irreducible, 
Vc must be the whole of V for some c and the other Vc are zero. That means, C acts 
diagonally in irreducible representations. 

The classical analogue of the universal enveloping algebra is the Lie-Poisson algebra dis- 
cussed in Chapter 9.5. 



10.3 Finite-dimensional representations 

We have already seen in Section 8.4 that the Lie algebra gl{n,K.) has many interesting Lie 
subalgebras. Given an arbitrary Lie algebra L it is interesting to see how we can represent 
L as a Lie algebra of matrices. In this section we consider finite-dimensional Lie algebras 
and finite-dimensional representations in more detail. 

For any vector space V over K we denote gl{V) the Lie algebra of linear maps from V to V 
with the Lie product given by the commutator f ^ g = fg — gf ■ If V is identified with 
we write gl{V) = gl{n, K) (see Section 8.4). A Lie algebra homomorphism : L — ^ gK^) 
called a finite-dimensional representation of L; the vector space V is then called an L- 
module. We call the representation complex if K = C and real if K = M. We have already 
seen that su{n) has a complex representation, since it is defined as a (real) subalgebra of 
gl{n,C). 

Given a representation : L — gl{V) we call W an invciriant subspace of V if (p{f)w G W 
for all / G L and all w G W. The representation is called irreducible if the only invariant 
subspaces are and V. We call the representation decomposable or fully reducible, if 
for any invariant subspace W there is a complementary invariant subspace W such that 

If 01 : L — >■ gl{Vi) and 02 • L ^ 5'^(^) ai'c representations of L we can form the direct sum 
representation 01^2 : L ^ 5f/(Vi © V2, K) by defining 

^iMiVl + V2) = 0l(/)(^l) + 02(/)(^2) 

for / G L and Vi G Vi, ^2 G V2. lit is easy to check the representation property. In terms 
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of matrices, the direct sum representation corresponds to the map given by 

Mf) 



/ 

in block matrices. 



Hf) 



If 01 : L ^ gl{V) and 02 • L fi'^(M^) are representations of L we can form the tensor 
product representation 0i^2 as follows: Each element / in L is sent to the linear map 

(t>i®2{f){v ®w)^ {Mf» ^w + v^ (10.2) 

for all V e y and w & It is easy to check that (10.2) defines a representation. 

In Section 10.6 we have already mentioned the adjoint representation ad : L ^ glij^) 
defined by 

ad/ : ^ ^ [/, g] . 

The map ad is clearly linear, and from the Jacobi identity we see 

adj;adj/(^) - adj/ad^(^) = 8id[x,y]{z) , 

hence 

[ada;,ady] = ad[^,y]. 

We can now rephrase the definition of the ideal (see Section 10.6) as follows: / C L is an 
ideal if and only if I is an invariant subspace of the adjoint representation. 



10.4 Representations of Lie groups 

Lie group representations have a similar definition as Lie algebra representations. 

10.4.1 Definition. A representation of a Lie group G in an associative algebra E with 
identity 1 is a map [/ : G — > E such that 

U{fg) = U{f)U{g), U{1) = 1. 

The representation is called faithful if U is injective. If E = LinH, one speaks again of 
a linear representation on H. A linear representation is called irreducible if the only 
subspaces closed under multiplication by linear mappings of the form U{f) are and H. 
A unitary representation of a Lie group G is a linear representation in the *-algebra 
E = Lin EI of continuous linear operators of a Euclidean space H, satisfying 

uifTuif) = 1. 



It is easy to see that U(f-^) = U{f)-\ and in the unitary case, U(f-'^) = U(f)-^ = [/(/)*. 

Note that the invertible elements of E form a group and a Lie group representation of G in E 
is a group homomorphism of G into this group. Again, if the representation is faithful, one 
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may identify group elements with their images under the representation, and then has an 
embedding of G into the algebra E. Thus if E is the algebra of n x n matrices with entries 
in K we get a group homomorphism of G into GL{n, K). For K = C the representation is 
unitary if the image of G lies inside U (n) . 

If a Lie algebra representation J : L — > E is an embedding, we can get something that is 
close to a representation of the Lie group by exponentiation, i.e., by defining 



k=0 

provided this converges for all / e L in the topology of E. In Subsection 10.4 we go deeper 
into the question of how to get a Lie group representation from a Lie algebra represen- 
tation and the problems one encounters. On the other hand, given a representation U of 
a Lie group G with Lie algebra L we can get a representation J of the Lie algebra by 
differentiation, i.e., by defining 



AX) :^ ±U(e'^) 



t=Q 



provided the derivative always exists. In finite dimensions, both constructions work gen- 
erally; in infinite dimensions, suitable assumptions are needed to make the constructions 
work. 



The group G acts on the Lie algebra L. We will discuss this shortly for groups of matrices. 
For every element g & Q C GL{n, K) we define Ad{g) which is a linear transformation of 
L given by 

Ad{g) : X ^ gXg-' . 

It holds that Ad{g)X G L, which we will not proof. The interested reader is referred to 
Knapp [137], Helgason [111], Frankel [81], or Kirillov [134]. For all the examples 
discussed so far, the reader can check it by hand. The map Ad : g — > Ad{g) clearly 
satisfies Ad{gh) — Ad{g)Ad{h) and is thus a representation, which is called the adjoint 
representation of the group G. 



Universal covering group. For Lie algebra representations an important construct is the 
universal enveloping algebra. For Lie groups there is an analogue. Above we mentioned 
that by differentiating a representation of a Lie group, one obtains a representation for the 
corresponding Lie algebra. By exponentiating a representation of the Lie algebra one gets 
a representation for those group elements that can be written as exponents. If a group is 
not connected, one does not obtain a representation of the group in this way. 

Other problems arise when the group is not simply connected. For example SO {3) is not 
simply connected and therefore certain representations of the Lie algebra cannot be lifted 
to representations of the Lie group; the spin representations become multivalued. Even 
other problems arise when two Lie groups that are fundamentally different have isomorphic 
Lie algebras. Consider for example the group U{1) of complex numbers of absolute value 
1. As a manifold U{1) is just the circle S^. The Lie group of U{1) is the one-dimensional 
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abelian Lie algebra (there is only one). Now consider the Lie group M where the group 
operation is addition a ■ b = a + b. Then R is a one-dimensional abelian Lie group with a 
one-dimensional Lie algebra. The Lie algebras of U{1) and M are isomorphic, but the Lie 
groups are totally different. When we want to lift a Lie algebra representation of either of 
them to a Lie group representation, which group do we choose then? 

These topological considerations lead one to the question whether there is a unique simply 
connected Lie group for a given Lie algebra. The answer is positive: for every real finite- 
dimensional Lie algebra L there is a unique simply connected Lie group G with Lie algebra 
L. So given a Lie group H with Lie algebra L one can construct a unique simply connected 
Lie group G with Lie algebra L. The group G is called the universal covering group of 
H. Then EI and G are locally isomorphic; there are small neighborhoods of the origin in 
both groups on which EI and G are diffeomorphic to each other. The Lie group EI is then 
a quotient of G; H = for some discrete normal subgroup D of G. 

The exponential map L ^ G is in general not surjectivc, however, the image of the ex- 
ponential map generates an interesting subgroup of G, the connected component of G, 
denoted Gq. li g & G lies in the connected component, we can write g — e^^ ■ ■ ■ e^'^ for some 
Lie algebra elements /i, . . . , J^. Given a Lie algebra representation we can uniquely lift it 
to a representation of the connected component of the Lie group Gq if Gq is simply con- 
nected. Therefore, in this case, the representations of the Lie algebra L are in a one-to-one 
correspondence with the representations of the universal covering group corresponding to 
L. 

Now let EI be a Lie group with Lie algebra L and with universal covering group G such 
that EI = G/K for some normal subgroup K of G. Given a Lie algebra representation 
of L, we get a Lie group representation of G. If the normal subgroup K is in the kernel 
of the representation, we get a well-defined representation of H as well. Conversely, given 
a representation of EI, we get a representation of G by first projecting to G/K, so that 
K is in the kernel. Hence, representations of EI are in one-to-one correspondence with 
representations of G that map K to the unit matrix. 



10.5 Unitary representations of the Poincare group 

This section is neither in a good form nor complete in contents. 

Knowing the simple and semisimple Lie algebras is of course interesting, but in physics 

there are also important non- semisimple Lie algebras that play an important role. In this 
section we treat one of the most important Lie algebras in physics, the Poincare Lie algebra. 
The Poincare Lie algebra is not semisimple since it contains an abelian (and thus solvable) 
ideal. 

In physics, the irreducible unitary representations of the Poincare algebra correspond to 
elementary particles, more precisely to particles considered at distances so large that 
their internal structure can be safely ignored. Using the Casimir operators in the universal 
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enveloping algebra, one can label the different representations, since a Casimir must take a 
constant value in any irreducible representation. 

There are two independent Casimirs. One is the number p^, which is an invariant since 
the Minkowski inner product is invariant. With our choice of signature ( — h ++), one has 
p2 _ p2 _ p2 _ ^Qj, constant m called the mass of the representation (or the 

associated particles); for the signature (H ), we have instead = pi — p"^ = {mcf'. 

In physically relevant representations, m > and po > 0. 

A second Casimir accounts for the spin s of the representation; for unitary representations, 
it is quantized, and takes nonnegative, half integral values. The particles are called 
bosons if the spin of this representation is integral, and fermions otherwise, i.e., if the 
spin is half an odd integer. For example, electrons have spin s — 1/2 and are fermions, 
while photons have spin s = 1 and are bosons. The name "spin" derives from relations to 
the representation theory of the rotation group; sec Section 15.1, where also the dichotomic 
nature of integral and nonintegral spin is explained, which justifies using different names 
for bosons and fermions. 

Clearly, representations which differ in mass or spin arc noncquivalent. Less trivial is the 
fact that, among the physical representations (i.e., those with m > and Pq > 0), 
there is an up to equivalence unique irreducible representation for each combination m > 
and s e In the massless case m — 0, there are precisely two for each s e a 

right-handed and a left-handed one. 

Given an irreducible unitary representation, we can choose a basis such that the components 
of p act diagonally, since they are Hermitian and commute. Thus we can assign to a vector in 
the representation the four momentum components. The momentum components will also 
be denoted pi,. The number E = poc is called the energy, and depends on the basis chosen, 
since the so(3, 1) rotations mix the momenta. Having fixed a basis of the translations, there 
is only a SO{3) subgroup that leaves the energy invariant. Intuitively this is clear, rotating 
a reference frame does not change the energies. In general, for a given basis, the subgroup 
of so(3, 1) that leaves the vector (1, 0, 0, 0) invariant is S0{3) and the elements of the S0{3) 
subgroup are rotations. There are three independent 5*0(3, 1) elements that do not leave 
(1, 0, 0, 0) invariant, these transformations and their linear combinations are called Lorentz 
boosts in the physics hterature. The Lorentz boosts mix time and space coordinates. A 
basis of Poincare Lie algebra thus consists of the generators of three rotations, three Lorentz 
boosts and four translations. 



10.6 Finite-dimensional semisimple Lie algebras 

For finite- dimensional Lie algebras a lot is known about the general structure; here we give 
an overview over the results most useful in physics. Since no details are given, this section 
may be skipped on first reading. 

Classifying all finite-dimensional Lie algebras is in a certain sense possible; all finite- 
dimensional Lie algebras are a semidirect product of a semisimple and a solvable Lie algebra 
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(to be defined below). The classification of all semisimple real and complex Lie algebras 
is completely understood. It turns out that the semisimple complex Lie algebras can be 
classified by studying certain root systems. The semisimple real Lie algebras are obtained 
by applying the classification of complex Lie algebras to the complexified Lie algebras and 
then finding all ways of turning the resulting complex Lie algebras into a Lie *-algebra; 
their real parts then give all semisimple real Lie algebras. 

In the semisimple case, every representation is faithful; hence a representation is nothing 
more than an embedding into a matrix Lie algebra gl{n,C), realizing the Lie algebra ele- 
ments by matrices. Every Lie algebra L comes with a canonical representation, the adjoint 
representation, denoted ad, which maps an element / to the Hamiltonian derivative adj 
in direction /, introduced in Section 8.2. Thus to each Lie algebra element / we assign a 
hnear operator on a vector space. The vector space is the Lie algebra itself and an element 
/ of the Lie algebra is represented by the linear transformation ad/ that maps an element 
e L to / n In the mathematical literature, one often writes the Lie product as a 
commutator. Then the definition takes the form 

adx{y) = [x, y] . 

Due to the Jacobi identity this indeed defines a representation. For finite-dimensional Lie 
algebras, there is a canonical symmetric bilinear form called the (Cartan—) Killing form, 
which we write as Bck and defined by 

BcK{x,y) = tr{adxady) . 

Due to the Jacobi identity, the Killing form is invariant, 

Bck{[x, z],y) = Bck{x, [z, y]) . 

Recall that an ideal of a Lie algebra L is a subspace / in L such that L ~i 7 C /. Thus 
an ideal is an invariant subspace under the adjoint action of the Lie algebra on itself. A 
Lie algebra L is called simple if it is not one-dimensional and has no nontrivial ideals 
(distinct from and L). Thus the adjoint action of L on itself has no nontrivial invariant 
subspace. A Lie algebra is semisimple if it is a direct sum of simple Lie algebras. There 
is a convenient criterion for a Lie algebra to be semisimple: 

10.6.1 Theorem. (Lemma of Cartan) 

A Lie algebra is semisimple if and only if its Killing form is nondegenerate. 

Proof. The proof can be found in many Lie algebra textbooks such as Jacobsen [121], 
Humphreys [117], Knapp [137], or Fulton & Harris [85]. n 

A finite-dimensional real Lie algebra L is called compact if its KiUing form is negative 
definite. In this case, the Lemma of Cartan implies that L is semisimple. For example, 
the Lie algebra so(3) is compact, whereas so{2, 1) is noncompact. However, note that 
Lie algebras are vector spaces and therefore not compact as topological spaces in the usual 
topology. 
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For a given Lie algebra one may form the so-called lower central series (or derived 
series) of ideals: 

Lq = L , L„_|_i = L„ ~i L„ , n > . 

The Lie algebra L is called solvable if there is an n such that L„ = 0. A theorem of Levi 
says that every Lie algebra is a semidirect sum of a semisimple part P and a solvable ideal 
S (that is, 5" is a solvable Lie subalgebra that is an ideal in L), such that P ^ h/S. It 
follows that an important part of the classification of all Lie algebras is the classification of 
the simple Lie algebras. 

The classification of the finite-dimensional complex simple Lie algebras can be done by 
classifying certain objects called finite root systems, associated to a choice of maximal 
commutative subalgebras called Cartan subalgebras. Associated to each root system is 
a finite reflection group, i.e., a group generated by elements whose square is 1. The finite 
refiection groups (also called Coxeter groups) which are not direct products of nontrivial 
smaller refiection groups arise as symmetry groups of regular polytopes. They have all been 
classified by Coxeter, and fall into five infinite families denoted by An (simplices), B^, Cn, 
Dn (all three related to cubes and crosspolytopes), and /„ (polygons), and a few sporadic 
cases denoted by Eq, Ej, Eg, F4, H^, and H4 [H^ is the symmetry group of the dodecahedron 
and the icosahedron). 

Most of the finite reflection groups are also realized as symmetry groups of a root system. 
All root systems give rise to semisimple Lie algebras, and irreducible root systems lead to 
simple Lie algebras. The classification says there are four infinite series of Lie algebras 
denoted An, Bn, Cn for n > 1 and Dn for n > 4 and five exceptional Lie algebras called 
Eq, Et, Es, G2 and F4. The corresponding refiection groups have the same labels, except 
for G2 which corresponds to the hexagon Jg. It is a highly nontrivial result - and one of 
the most beautiful pieces of mathematics - that this gives a complete classification of the 
finite-dimensional semisimple complex Lie algebras. 

The four infinite series of Lie algebras, called the classical Lie algebras, are realized 
geometrically as infinitesimal symmetry groups of certain bilinear forms, i.e.. Lie algebras of 
linear transformations with zero trace whose exponentials leave the form invariant. The Lie 
algebras A^ are isomorphic to the special lineeir Lie algebras sl{n-\-l, C) of {n-\-l) x (n-|-l)- 
matrices with complex entries and trace zero. The Lie algebras and Dn are the odd and 
even special orthogonal Lie algebras so(m, C) (m = 2n + 1 and m = 2n, respectively), 
consisting of complex antisymmetric m x m-matrices. For the C-series we have C„ = 
sp{2n, C), where the symplectic Lie algebras sp{2n, C) are given by the complex 2n x 2n- 
matrices X satisfying X^J + JX — where J is the antisymmetric 2n x 2n- matrix given 
in block form by 

'0 -1' 
1 



J 



For each complex Lie algebra L of the A-, B-, C- or D-seiies, there is an associated (simply 
connected) Lie group denoted by the same, but capitalized letters, whose complexified 
tangent space at the identity coincides with the Lie algebra L. 

There is some redundancy in the nomenclature for low- dimensional Lie algebras: Ai = 
Bi ^ Ci, B2 = C2. It is easy to check that sp{2, C) ^ sl{2, C) ^ so{3, C). The Lie algebra 
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so(2) is onc-dimcnsional (and hence abelian) and therefore not simple. The Lie algebra 
so(4, C) is in fact semisimple, so(4, C) = so(3,C) © so(3,C), and not simple since each 
so(3, C)-factor is a nontrivial ideal. The Lie algebra so(6, C) is isomorphic to s/(4, C). For 
the just mentioned reasons, one starts the D-series for n > 4; so{8, C) is the first in the 
series that is not isomorphic to any other. In fact, so{8, C) is very special in that it has a 
large automorphism group (related to triality). For the exceptional simple Lie algebras 
Eq, Ej, Egj F,i and 6*2, there is no simple geometric description as for those in the 5-, C- 
and D-series. However, the exceptional simple Lie algebras can be realized as infinitesimal 
symmetry groups of some algebraic structure. And to each exceptional Lie algebra L one 
can associate a Lie group, such that L is the complexification of the tangent space at the 
identity. 

It is important to keep in mind over which field the Lie algebra is considered. For example, 
over the real numbers the Lie algebras so(p, q) = so{p, q; M) are non-isomorphic, apart from 
the trivial isomorphism so{p, q) = so{q,p). Over the complex numbers we have so{p, q; C) = 
so{p + q, C), since over the complex numbers the sign of a nondegenerate symmetric bilinear 
form is not invariant. Even more severe things are dependent of the field; the real Lie algebra 
so(l,3), which is extremely important in physics, is simple, but extending the field to the 
complex numbers we have so(l, 3; C) = so(4, C) = so(3, C) © so(3, C), which is not simple. 
(For applications to physics, this is actually an advantage.) However, this is as bad as it 
can get from the structural point of view; if a Lie algebra is semisimple over some field K, 
then it is semisimple over all fields containing K. This follows from the Lemma of Cartan 
10.6.1: If the Killing form is nondegenerate over some field, then extending the field does 
not change this property. 

The (semi-)simple real Lie algebras can also be classified, albeit the classification is a 
bit more complicated. See for example the books of GiLMORE [92] (or, for the more 
mathematically minded, Helgason [111] or Knapp [137]). If a real Lie algebra L is 
simple, the complex extension - letting the scalars be complex - is either simple or of the 
form S (B S for a simple complex Lie algebra S. Hence the classification of the real simple 
Lie algebras is still 'close' to the classification of the simple complex Lie algebras in the 
sense that no completely new structures appear. It is an amusing historical fact that Elie 
Cartan provided the classification of the complex simple Lie algebras and his son, Henri 
Cartan, finished the project so to say by classifying the real simple Lie algebras. 

As we shall see in Chapter 14, the unitary representations of different real forms of the 
same complex Lie algebra can be quite different. The Lie algebra so{2, 1) does not admit 
a finite-dimensional unitary representation, whereas so(3) does. All compact Lie algebras 
admit a unitary representation, and in fact, the adjoint representation is already unitary. 
The main difficulty in the proof of this hes in estabhshing that all compact Lie algebras 
admit a Lie *-algebra structure; this requires more theory and will not be discussed here. 
Since finite-dimensional unitary groups are compact, noncompact semisimple Lie algebras 
cannot have finite-dimensional unitary representations, apart from the trivial one which 
maps everything to zero. 
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10.7 Automorphisms and coadjoint orbits 

The adjoint and coadjoint representations of a Lie algebra L extend to elements g e AutL 
by defining 

§9 ■= gSg-^ = Adg6 for 5 e L 
a;^ = Ad*-ia; for a; e L* 

with the properties 

and for continuous motions g e C^([0, 1], AutL), 

dr dr 

iL^.M = ^(^)nu;3W; 
dr dr 

in short, 

(59)-=^n5^ {uj^y ^g^u^. 

A set C y is called L-invariant (in a given representation Q of L on 1/) if, for all 
5 e C([0, 1],L) and all a;o G there is a unique oj e C^([0, 1], Q) such that 

cl;(r) = g(5(r))cu(r), c^(0) = uj^. (10.3) 

The set of points a;(l) reachable from a fixed ojq in this way is called the orbit Orb(a;o) of 
cuq. The orbits partition and $7 is invariant iff it is a union of orbits. The coadjoint 
orbits are the orbits in the coadjoint representation on L*. Apparently, ^2 = Orb(a;) is a 
manifold homeomorphic to AutL/ Stab(a;), and the tangent space at a; is 

T^Q = {Q{S)u I 5 e L}. 

The coadjoint orbits correspond to maximal subgroups and are symplectic manifolds with 
closed 2-form a;(/, g) := tr p(/ n gr) The set of all a; e L* for which a fixed set of casimirs 
takes fixed values is always invariant. 
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Chapter 11 

Fields, forms, and derivatives 



In this chapter we introduce basic material on manifolds, the associated commutative alge- 
bra of scalar fields, and the Lie algebra of vector fields. All manifolds used in this book are 
arbitrarily often differentiable, real manifolds whose dimension need not be finite. However, 
we are very brief and sometimes incomplete in the technical details that need attention in 
the infinite-dimensional case; on first reading, the reader may restrict everything to the 
finite-dimensional case, where these details are not required. 

We first recall some basics from differential geometry. Our approach differs from standard 

introductions to differential geometry since, consistent with the theme of the book, all 
definitions are given in an algebraic way. As a side benefit, this prepares the reader to 
noncommutative geometry, only briefly touched in this book, where a manifold structure is 
no longer available and all geometry enters in an algebraic way. Among other applications, 
noncommutative geometry gives an interesting geometric perspective to the quantum field 
theory of the standard model. 

Vector fields on a manifold M are essentially equivalent to derivations on the commutative 
algebra C°°(M) of scalar fields. However, to be able to use the traditional terminology, 
where vector fields and the corresponding derivations (Lie derivatives) are distinguished, 
we introduce an abstract set W = vcctM of vector fields, whose elements arc put into 
correspondence with derivations by means of a mapping d : vect M Der M which is 
applied at the right. In this way, the calculus on manifolds can be formulated in a purely 
algebraic way, without any reference to the manifold. 

We therefore formulate everything in terms of an arbitrary topological commutative algebra 
E in place of C°°(M), and an arbitrary set W in place of vect M. However, the main situation 
that the reader should have in mind is where E is an algebra of complex-valued, arbitrarily 
often differentiable functions on a finite-dimensional manifold, for example C°^(]R"). But 
E could also be the Schwartz space of arbitrarily often differentiable functions all of whose 
derivatives decay faster than polynomially at infinity. 

As a result, our presentation is completely coordinate-free, except in some examples. For 
readers accustomed to differential geometry in index notation but not to the coordinate- 
free Cartan notation, we suggest that they translate the definitions and main results into 
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coordinates to understand their meaning, but to treat proofs as if the concepts introduce 
new abstract algebraic notions. 



11.1 Scalar fields and vector fields 

We introduce the objects, operators, and operations needed for presenting the traditional 
differential calculus in a purely algebraic framework: Lie derivatives applied to multilin- 
ear forms, and exterior products and the exterior derivative of alternating forms. As the 
most important special case, we consider manifolds and associated geometric notions, in 
particular diffeomorphisms. 

Before giving the definitions, we discuss the letter conventions and priority rules used in 
the formulas. 

We typically (i.e., when not forced by conflicts or tradition to do otherwise) use lower case 
letters from the middle of the alphabet, such as /, g, h, to denote scalar fields, capital letters 
from the end of the alphabet, such as X, Y, Z to denote vector fields, capital letters from 
the beginning of the alphabet, such as A, for general multilinear forms, but z, for linear 
forms, LV for alternating bilinear forms, and f] for symmetric bilinear forms. 

We use the convention that a Lie derivative acts on the shortest following expression which is 
syntactically a vector field or a multilinear form. Similarly, the exterior derivative operator 
d acts either on the right on a vector field, or on the left on the shortest following expression 
that is syntactically an alternating form. The wedge product A has lower priority than the 
operations written as juxtaposition, but higher priority than + and — . 

11.1.1 Definition. 

(i) A differential geometry consists of a commutative algebra E containing C, a left 
E-module W with an additional Lie product "i , both equipped with a topology such that 
all operations are continuous, and a continuous mapping d (written on the right), which 
maps X eW to Xd e Der E, such that 

(X + Y)d^ Xd + Yd, (fX)d^ f(Xd), (X nY)d^[Xd,Yd], (11.1) 

for all X, y e W, / e E. The differential geometry is called (non-)commutative if the 
multiplication in E is (non-)commutative. 

(ii) We refer to the elements of E as scalar fields, and to the elements of W as vector 
fields. The Lie derivative of a vector field X is the linear mapping Lx which maps a 
scalar field / to^ 

Lxf := Xd /, (11.2) 

and a vector field Y to 

LxY := X n F = - LyX . (11.3) 

"'^As will become apparent in Section 11.3 (cf. Theorem 11.3.2), we may read the term Xd f also as 
product of the vector field X with the exact linear form df. Until then, we shall write an explicit space 
after d to remind the reader of the correct way to group the letters. 
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The scalar field Lxf (resp. the vector field LxY) is called the directional derivative of 
the scalar field / (resp. the vector field Y) in the direction of the vector field X. 

11.1.2 Example. (Canonical differential geometries) 

Let E be an arbitrary topological algebra containing C. We may give E the structure of 
a differential geometry by picking an arbitrary set W with the same cardinality as Der E, 
and choosing an arbitrary bijection d from W to Der E. W inherits all properties of Der E 
by means of the bijection d: We turn W into a Lie algebra and a topological E-module by 
defining 

X + Y :^{Xd + Yd)d-\ fX:^{f{Xd))d-\ 
Xny := [Xd,Yd\d'\ 

for X,Y eW and / G E, and by calling a set 5" e W closed if its image under d is closed 
in the topology of Der E induced by that of E. The result is a differential geometry. We 
call differential geometries constructed in this way canonical. 

The following example is responsible for the naming. Interpreting the set M in the example 
as the domain of a chart of a finite-dimensional manifold, one can translate everything said 
here to general finite-dimensional manifolds by a process described in all books on differ- 
ential geometry. Thus the example gives essentially the full intuition for our constructions, 
except for the complications that may arise in infinite dimensions. 

11.1.3 Example. (Differential geometry of open subsets in R^") 

Let M^" denote the vector space of row vectors ) with n real components 

x^, let M be a nonempty, open subset of M^", and let E = C°°(M) and W = C°°(M, C^"), 
equipped with the weak topology. Thus scalar fields are real-valued functions, while 
vector fields are row vector valued functions. In terms of the partial differential operators 
dj defined by 

djf{x) :^df{x)/dx^, 
we define the gradient df of a scalar field as the column vector with n entries 

It is not difficult to show that a derivation on the algebra of scalar fields can be uniquely 
expressed as a linear partial differential operator of the form 

n 

S = Xd = Y,^'9j, 
with a vector field X. Such a 6 acts on scalar fields / as 

n 

5f^Xdf = J2x'djf. 

^The index notation corresponds to standard differential geometric practice when working in a chart of a 
manifold (which is essentially the situation we are discussing here) . The interpretation in terms of rows (row 
vectors = rovectors, indexed by upper indices = roindices) and columns (column vectors = covectors, 
indexed by lower indices = coindices) makes the transition to standard linear algebra transparent. 
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Thus the mapping d which maps the vector field X to the differential operator Xd is a 
bijcction of the type required in the previous example. Thus we have a canonical differential 
geometry; it is clearly commutative. The reader is invited to check that the Lie derivative 
takes the form 

Lxf = Xdf, LxY = XdY - YdX. (11.4) 

A second, noncanonical differential geometry results by using in the above construction in 
place of C°°(M) the subalgebra C^(M) of scalar fields with compact support, and in place 
of C°°(M, C^") the subspace C~(M, C^") of vector fields with compact support. 

11.1.4 Proposition. Tie product rule 

Lxifg) = {Lxf)g + f{Lxg), Lx{fY) = {Lxf)Y + f{LxY), (11.5) 



the commutation rule 

and the equations 



hold for f,g EE and X,Y eW, 



[Lx , Ly] — Lx~i Y, 

Lfxg = fLxg, 

LfxY^fLxY-XYdf 



(11.6) 

(11.7) 
(11.8) 



Proof. The first part of (11-5) is trivial since in this case Lx — Xd is a derivation on E. 
For the second part of (11.5), we note that 

(^Lx{fY)yg = {XnfY)dg = [Xd,fYd]g = Xd(^f{Ydg)^-fYd{Xdg) 
= {Xd /) {Yd g) + fXd {Yd g) - fYd {Xd g) . 

Also, 

{{Lxf)Y)dg = {{Xdf)Y)dg = {Xd f){Yd g) 

and 

(^f{LxY)yg = (^f{XnY)yg = f[Xd,Yd]g 

= f(^Xd{Ydg)-Yd{Xdg)^ = fXd {Yd g) - fYd {Xd g) 

Putting these three pieces together proves the second part of (11.5). 

To prove (11.6), note that Lyf = Yd f is in E, so LxLyf = Lx{Yd f) = Xd {Yd f). 
Interchanging X, Y we find for the commutator: 

[Lx,LY]f = Xd{Yd f)-Yd{Xd f) = [Xd,Yd]f = {X ^Y)d f = Lx.y f ■ 

Formula (11.7) is immediate from the definition, and (11.8) follows from the product rule 
X "1 fY = {Xd f)Y + f{X ~i Y) by swapping X and Y , using the anticommutativity of "i . 

n 
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In the following, we develop the differential calculus for commutative differential geometries 
only; thus, with exception of the remarks on noncommutative geometry in Section 11.5, 
the algebra E of scalar fields is always assumed to be commutative. In this case, 
we extend the left module structure on vector fields to a bimodule structure by putting 

Xf := fX 

for / G E and X e W. Note that some authors treat vector fields as synonymous with 
derivations and therefore write X{f) for Xd f. This should not be confused with the 
present notation Xf for multiplying the vector field X with the scalar field /. 



11.2 Multilinear forms 

Apart from scalar and vector fields, differential geometry makes heavy use of multilinear 
forms and tensors, which we define next. 

11.2.1 Definitions. 

(i) A linear form C is a continuous, E-linear mapping ^ : W — >■ E (written on the right^) 
which maps the vector field X to the scalar field X(. We write W* for the E-module con- 
sisting of all hnear forms, (sometimes called the "E-dual" of W), with scalar multiphcation 
of C e W by / e E defined via 

X{fO :=/(XC). 

(ii) A c-Iinecir form is a mapping : W x . . . x W — > E (with c factors of W in the 
Cartesian product) such that the image^ Xi . . . Xc(j) of (Xc, eWx...xW depends 
E-linear ly on each argument X^, i.e., if, for all X^, F, Z e W and /, e E, 

X,...{fY + gZ)...X,<j) = fX,...Y...X,(f) + gX^. . . Z . . .X,4> . 

Here the unindexed argument between the dots replaces the fcth argument X^, for some k 
in 1, . . . , c. In the degenerate case c = 0, we consider the 0- linear mappings to be the scalar 
fields. 

^Strictly speaking, they should be called E-linear forms, and a similar remark applies later to multilinear 
forms. Talking about a form rather than a mapping implies the assumption of continuity. 
The standard notation for XC, is ixC, = C{X)\ the present notation simply replaces ix by X. This way of 
writing the mapping generalizes standard matrix calculus if we use the intuition gained from Example 11.1.3 
and think of vector fields as row vectors and of linear forms as column vectors, an intuition that extends to 
matrix fields. Since in the general situation, linear forms are often called covectors, we shall occasionally 
use the analogous word rovector to denote a vector field, although, strictly speaking, one should talk 
about covector fields and rovector fields. The same ambiguity is traditionally maintained for multilinear 
forms on manifolds, which refer both to the corresponding fields and to their values at a particular point. 

^The traditional notation for X\. . . Xc(j) is ixi ■ ■ ■ ixc4' = ?^(^c, • • • , -^i); as for linear forms, the present 
notation simply replaces the ix by X. Note the reverse order resulting in the arguments written in the 
traditional way, needed in order that (11.9) together with our definition (11.10) of insertion is consistent 
with the traditional definition {ix4>){Xi, . . . , Xc-i) = <I>{X, Xi, . . . , Xc-i). In our notation translates into 
Xc-i . . . Xi(ix0) = Xc-i . . . XiXcf). 
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(iii) We write^ Wc for the E-module of continuous c-linear mappings on W. Scalar multi- 
plication of e Wc by / e E is defined via 

X^...X,{f<^>) := /(Xi...Xc0) . 

The elements of Wc are called multilinear forms or c-linear forms; for c = 2 also 
bilinear forms. Note that Wo = E consists of scalar fields (or 0-forms), and Wi — W* 
consists of linear forms (or 1-forms) . 

(iv) The product of a vector field X G W and a c-linear form (f) G Wc is for c = the vector 
field X(j) defined by scalar multiplication with the scalar 0, and for c > the (c — l)-linear 
form X(j) defined by 

Xi... Xc-i(X(l)) := Xi... Xc-iX(l) , for all Xi, . . . , X^-i G W. (11.9) 

The operator ix defined on multilinear forms by 

ix^ := X(p (11.10) 

is traditionally called the insertion of X G W; cf. footnote 11.2.1. 

(v) A c-linear form is called alternating (or a c-form) if either c < 1 or Xcp is alternating 
and XX(f) — for all vector fields X. is called symmetric if either c < 1 or Xcf) is 
symmetric and XY(f) = YX(f) for all vector fields X, Y. We write Ac and Sc for the space 
of alternating and symmetric c-linear forms, respectively. In particular, 

Eo = Wo = E, El = Wi, E2 © §2 = W2 . 

(vi) The transpose of a bilinear form is the bilinear form (f)^ defined by 

XY(p'^ := YXcj) (11-11) 

for all vector fields X, Y. In particular, a bilinear form is symmetric iff 0-^ = and 
alternating iff 0^ = —0. A bilinear form is called nondegenerate if every linear form 
( G W* can be written as C = X0 for a unique vector field X; otherwise degenerate. A 
bilinear form may be considered as a linear mapping from W to W* that maps the vector 
field X to the linear form X0. If is nondegenerate, this mapping is invertible, and the 
inverse 0~^ is a linear mapping from W* to W, which maps a linear form ( to the vector 
field C^""*^ ill such a way that 

C0-V = C- (11-12) 

A nondegenerate bilinear form is called a symplectic form if it is alternating, and a 
metric if it is symmetric. 

^Writing the "c" in Wc as a subscript serves as a reminder that when an element (f) of W3 (say) is 
written in index notation, it has 3 lower indices: ^^(f>ijk\ Indeed the "c" is intended to be suggestive of 
"covector" . Note that in terms of the direct products of c factors W or W* , there is a canonical isomorphism 
W* X . . . X W* ^ (W X . . . X W)* = W*. 



11.2. MULTILINEAR FORMS 



225 



(vii) A [c, r]-tensor field® T is a continuous E-lincar mapping T : Wc- The space of 

[c, r]-tcnsor fields is denoted by'^ W[c, r] = = Lin(Wr, Wc). A [1, l]-tensor field is called 
a matrix field. 

11.2.2 Remcirks. (i) Multilinearity implies 

XYlo — —YXuj for alternating c-forms u with c > 1. 

Thus, for c > 2, 

i\ — 0, ixiy — —iyix on alternating c-linear forms, (11.13) 

whereas 

ixiy — iyix on symmetric c-linear forms. (11.14) 

(ii) Note that there is a canonical identification of W[c, 0] with Wc, and a canonical embed- 
ding of W into W[0, 1]. In the case of finite-dimensional manifolds, we may also identify 
W[0, 1] with W. 

(iii) The ordinary operator product of a [c', c]-tensor and an [c, r]-tensor is well-defined, and 
is a [c'j r]-tensor: W[c', c]W[c, s] C W[c', s\. In particular, W[l, 1] = Lin Wi is an algebra of 
matrix fields. 

11.2.3 Theorem. For every vector field X, the Lie derivative can be extended uniquely 
to a linear operator Lx mapping vector fields to vector fields and c-linear forms to c-linear 
forms, and satisfying the product rule 

Lxim = (Lx/)0 + /(Lx0), LxiY<l>) = {LxY)<f> + Y{Lx<P). (11.15) 

for f E K, Y E W, and (p E W or (p E Wc . The extended Lie derivative satisfies the 
commutation rules 

[Lx,LY]^LxnY, (11.16) 
[Lx,iY]^ixnY (11.17) 

for X,Y eW. 

^An [c, r]-tensor is also called a tensor of [^]-valence (Penrose & Rindler [192]). With traditional 
index notation, a c-linear form is written with c lower (co)indices, and a [c, r]-tensor is written with r upper 
(ro)indices and c lower (co)indices. E.g., a [3, 2]-tensor T is written Tijk"^" , and the image T0 of a bilinear 
form is written {Tipjijk = Tijk™""^ (pmm using the traditional Einstein summation convention (which 
deletes the explicit indication of the sum over m and n so as not to unnecessarily inflate the formulas 
without conveying any more information). In the more modern abstract index notation of Penrose 
&: Rindler [192], such repeated indices denote instead an insertion (dual-pairing) without any implied 
connotation of summation over basis-dependent components, and such indices may be used to keep explicit 
track of the types of complicated objects. 

^For the differential geometry of open subsets M of K." (Example 11.1.3), W[c, r] = WJj is, in the 
traditional terminology, the space of sections of the tensor bundle TJM. 

*The Lie derivative can also be extended to tensors T e W[c, r] by defining 

{LxT)B := Lx{TB)-TLxB for B e W^. 

We do not need such an extension for our limited applications; it would be needed, however, in a treatment 
of general relativity. The reader is invited to verify that LxT e W[c, r] and to formulate and prove the 
analogues to (11.15) and (11.16). 
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Proof. We first assume that the product rule holds, and show that this fixes the operation 
of Lx on all multinear forms. By the product rule (11.15), 

YLx(l> = Lx{Y(l)) - {LxY)(t>, (11.18) 

This formula shows that Lx is determined on c-linear forms by its action on (c — l)-linear 
forms, and since it is given on scalar fields, it is unique if it exists at all. 

Conversely, to show existence of the extension, we define Lx recursively by (11.18), starting 
with the known action of Lx on scalar fields. 

Since {fY)Lx(t> = Lx{fY<t>) - Lx{fY)<t> = {Lxf)Yct> + fLx{Y<t>) - {Lxf)Y<t>- f{LxY)<t> = 
f{Lx{Y(f)) — f{LxY)(f) — f(YLx4>), we see inductively that YLx(f) is E-hnear in Y, so that 
Lx(p is indeed a tensor. 

The first part of the product rule holds since, by (11.18), the equation YLx{f(t>) — 

LxiYfcj)) - {LxY){f(l)) = Lx{fY<j)) - {LxY){f^) = {Lxf)Y<j) + fLx{Y<j)) - {LxY)f)<j)- 
f{LxY)(j) = Y{Lxf)(t> + Y f{Lx4>) holds for all vector fields Y. The second part of the 
product rule follows directly from (11.18). 

To prove (11.16), we first note that by (11.18), we have ZLxLycj) = Lx{ZLY(p) — {LxZ)LY(f> — 
Lx{Ly{Z<P) - {LyZ)<P) - {LxZ)Ly(I) = LxLy{Z(l)) - {LxLyZ)<p- {LYZ)Lx(i)- {LxZ)Ly<l). 
The last two terms are symmetric in X,Y, hence cancel when taking the difference with 

ZLyLx^) in Z[Lx, Ly](l) — ZLxLycf) — ZLyLx(l> — {LxLy — LyLx){Z(j)) — i^LxLyZ — 

LyLxZ)(t) = [Lx,Ly]{Z(p) - {[Lx,Ly]Z)(p. Since (11.16) is already known by (11.6) to 
hold on vector fields and on scalar fields (0-linear forms), we assume that we know its valid- 
ity for the action on c-linear forms. Taking for a (c+ l)-linear form, we may conclude that 
Z[Lx, Ly]4> = Lx n y{Z(j)) — {Lx n yZ)(j) = ZLx n y4> by (11.18). Since Z was arbitrary, 
we conclude that [Lx,Ly](f) — Lx n y0 for (c + l)-linear forms 0. By induction, (11.16) 
holds in general. 

(11.17) follows from the product rule (11.15) since 

[Lx,^y]0 = Lx{Y(l)) - Y{Lx4>) = {LxY)4> = {X n F)0 = tx n y0. 

□ 

The reader may wish to prove inductively that, for c'-linear forms (f) with c' > c, 

c 

Xi... XcLx(f) = Lx{Xi . . . Xc4>) - ^ Xi . . . X n Xfe . . . X^^. 

k=l 

11.2.4 Proposition. The Lie derivative of an alternating c-form is again an alternating 
c-form. 

Proof. Indeed, for any alternating form u, 

YYLxUJ = Y{Lx{Yuj) - {LxY)uj) = YLxiYu) -Y{LxY)uj 
= YLx{Yuj) + {LxY)Yuj = LxiYYuj) = . 

□ 
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11.3 Exterior calculus 

In the case of differential geometry in M^" (Example 11.1.3), the gradient operator d = d 
behaves symbolically similarly to a covector, except for the nontrivial behavior implied by 
the Leibniz product rule. However, the gradient operator d, defined there on scalar fields 
only, cannot be extended to a gradient operator that associates with a general c-linear form 
a (c+ l)-linear form dcf) such that Lx0 = Xd0 for all vector fields X. The existence of 
such an extension would imply that Lfx(f> = fLx(f>- While this holds by (11.7) when is 
a scalar field, it fails already when is a linear form. For a linear form we have instead 

LfxC^fLxC + dfXC, 

which follows as a special case of (11.21) below, or from 

LxC^X{dCf + dXC 

by substituting fX for X. However, the gradient can be generalized in a different way to 
alternating forms, leading to the exterior derivative. 

The generalization is valid not only for Example 11.1.3, but in full generality. To define the 
exterior derivative we need some preparations. 

11.3.1 Theorem. For every linear form (, there is a unique K-linear mapping (A mapping 
alternating c-forms to alternating (c + l)-forms for c = 0, 1, 2, . . and vector fields to zero, 
such that 

{XC)u;^ X{C Au;) + C AXco (11.19) 

for all vector fields X and all alternating c-forms uj. C, Au is called the exterior product 
or wedge product^ of C, and u. The exterior product satisfies the rules 

CAC' = -C'AC, (11.20) 

Lx(CAa;) =LxC Aa; + CALxa; (11-21) 
for alternating c-forms cu, linear forms C, C'; vector fields X. 

Proof. A necessary and sufficient condition for (11.19) to hold is that 

X{CAu;)^{XC)u;-C^Xu;; (11.22) 

in particular, 

C A u; = for a 0-form uj, (11.23) 

This completely specifies the exterior product of a linear form ( and an alternating (c+ 1)- 
form u, given the exterior product with an alternating c-form. Therefore, if the exterior 
product exists, it is unique. 

To prove the existence of the exterior product, we have to define the exterior product of 
a linear form C and an alternating c-form cu to be the expression C Acu defined for c = 

^One can define an exterior product co' Au for arbitrary alternating forms u>, u>', but we do not need it. 
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by (11.23) and for c > recursively by (11.22). To show that we really get an alternating 
(c + l)-form, we need to show that X{( Alu) is alternating for c > and any vector field 
X, and verify 

(/X)(CAu;)=/(X(CAa;)) (11.24) 

and 

XX{CAuj) = 0. (11.25) 



n 



(11 


.28) 


(11 


.29) 


(11 


.30) 


Note 


tie 



11.3.2 Theorem. There is a unique linear mapping d mapping vector Gelds to zero and 
alternating c-forms to alternating (c + l)-forms (for c = 0, 1, 2, . . .j such that^^ 

LxUJ^Xdu + diXu) (11.26) 

for all alternating c-forms oj and vector fields X. The alternating form dw is called the 
exterior derivative of to, and satisfies the exactness relation 

ddu = (11.27) 

and the product rules 

d{fuj) = fdcu + df Acu, 

d{C Auj)^dCAuj-CA dcu, 

Lfxio = fLxUJ + df A Xuj, 

for all alternating froms uj, scalar fields f, linear forms ^ and vector fields J 
minus sign in (11.29).' 

Proof. A necessary and sufficient condition for (11.26) to hold is that 

X{duj) = Lxuj-d{Xu;y, (11.31) 

in particular, 

X{du;) — Xd oj — Lxoj for a 0-form uj. (11.32) 

This completely specifies the exterior derivative of an alternating c-form. Therefore, if the 
exterior derivative exists, it is unique. To prove the existence of the exterior derivative, we 
have to define the exterior derivative of an alternating c-form uj to be the expression du 
determined for c = by (11.32) and for c > recursively by (11.31). 

To show that we really get an alternating (c + l)-form, we need to show that X{duj) is 
alternating for c > and any vector field X, and verify 

{fX)duj = fiXdio) (11.33) 



^°In particular, the relation Lx = Xd valid on scalar fields fails to hold for the extension of Lx and d to 
alternating forms. 
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which shows that Xdcu is E-hnear in X, so that dcu is a (c + l)-hnear form, and 



XXduj = 



(11.34) 



which proves antisymmetry. The proof of (11.33) is based on (11.30). To prove (11.30), we 

get indnctivcly Y{Lfxi^- fLxUJ-df AX iu) = YLjxUJ- fYLxUJ-Ydf AXlj = LfxiYu)- 
{Lf^Y)uj-f{Lx{Yu) - {LxY)lo) - {{Ydf)Xu-df AYXu). Using the induction hypothesis 
on the first term and (11.28) on the second term, one finds that all terms cancel. From 
(11.30) one obtains {fX)duj = LfxUJ -d{fXuj) = fLxUJ + dfAXuj-{fd{Xuj)+dfAXuj) = 
f{LxUJ — d{Xijj) — f{Xduj), showing that (11.33) holds. (11.34) follows inductively from 

XXduj = XLxUJ-Xd{Xuj) = XLxUJ-Lx{Xuj)-d{XXuj) = -{LxX)uj-d{XXuj) = 0-0 = 0. 

To prove the product rule (11.28), we , and then inductively Xd{fuj) = Lx{fu!)—d{fXu) = 
{Lxf)uj + fLxUJ - {fd{Xuj) + df A Xu) = f{LxUJ - d{Xuj)) + {Xdf)u - df A Xu = 
fXduj + X{df Auj) — X{fdw -\- df Au), completing the induction. 

To prove the exactness relation (11.27), we need the formula 



Indeed, Yd{LxOj) = Ly{Lxoj) — d{YLxOj) = LYLxOJ — d{Lx{Yuj) — {LxY)uj), whereas 
YLx{duj) = Lx{Yduj)-{LxY)duj = Lx{LYOJ-d{Yuj))-{LL^YOJ-d{LxY)uj) = LyLx(jJ + 
d{{LxY)u!) — Lxd{Yu!) since Lx i y — Lx -\ y — [Lx, Ly] — LxLy — LyLx- Comparing 
the two expressions, one finds that Yd{LxUj) — YLxidcu) — d{Lx{Yu!) — Lxd{Yu!), which 
vanishes inductively. 

Now (11.27) follows inductively from the relation Xd{du;) — Lxdw — d{Xdu;) — Lxd^i — 
d{Lxoj — d{Xuj)) = Lxduj — d{Lxoj)+dd{Xuj) = —dd{Xuj), obtained by using (11.35). (For 
0-forms, the dd-term is absent, which starts the induction.) 



In particular, the exterior derivative of the linear form C, is the alternating bilinear form d( 
with 



since YXdC = Y{LxC - d{XC)) = YLxC - Ly^XQ = YLxC - ((LyX)C + XLyQ = 
YLxC - XLyC - (Y n X)C. 



An alternating c-form u is called closed if duj = 0, and exact if it can be written in the 
form uj = dO for some (c — l)-form 9. In particular, a linear form C, is exact if it has the 
form C = df for some scalar field /. 

By (11.27), every exact c-form is closed. The converse is not generally valid but holds in 
simple cases, e.g., by the Poincare Lemma, when E = C°°(M), where M is a nonempty, 
open and convex subset of R") . 



d{LxOj) — Lx{duj). 



(11.35) 



□ 



YXdC = YLxC - XLyC + {X n Y)C, 
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11.4 Manifolds as differential geometries 

A central notion for analysis on infinite-dimensional spaces is that of a convenient vector 
space. This notion is discussed in detail in Kriegl & MiCHOR [141], and refines the notion 
of a Hausdorff vector space, which is a vector space with the minimal amount of topological 
structure to allow the definition of a meaningful limit. A convenient vector space has in 
addition a meaningful notion of differentiability of paths, a property essential for differential 
geometry on manifolds. (For more details on basic notions from topology and functional 
analysis; see, for example, RUDIN [217].) 

11.4.1 Definition. A vector space F over R is called locally convex if there is a family 
S of seminorms, i.e., mappings s : F — > R such that 

s{ax + Py) < \a\s{x) + \P\s{y) 

for all x,y and all a, /3 e R, with the property that 

s{x) = O"for all s G 5 x = 0. 

A locally convex vector space becomes a Hausdorff space by defining a neighborhood of 
a; e F to be a set containing for each s E S some set of the form {y e F | s{y — x) < r} 
for some real number r > 0. Thus a sequence (Z = 0, 1, 2, . . .) in F converges to x e F 
iff s{xi — x) — > for all s & S. A path tt, i.e., a continuous mapping tt : R ^ F, 
is called smooth or arbitrcirily often difFerentiable if there are paths tt'^'^^ : R — > F 
{k = 0, 1, 2, . . .) such that 

uk 

n{t + h) = V — 7r(^)(t) + 0(/,fc+i) 

/v! 

fe=0" 

for all t,h eM. and all natural numbers n. Clearly, tt^^^ = tt, and we write tt := 7r^\ 

The reader should verify that, for any seminorm s and a e R, x e F, 

s(0) — 0, s{ax) = \a\s{x) > 0, 

that in any locally convex vector space, addition and scalar multiplication are continuous, 
and that differentiation satisfies the traditional rules. 

11.4.2 Definition, (iv) A convenient vector space is a locally convex vector space F 
over R such that every smooth path in F is the derivative of another path in F. A complex- 
valued function / on a nonempty and open subset M of F is called smooth or cirbitrcirily 
often differentiable if the complex- vahicd function / o tt is smooth for every smooth path 
TT : R ^ F. The space of smooth functions / : M ^ C is denoted by C°°(M). 

The reader should verify that in a convenient vector space there is a mapping d : C°°(M) — > 
Lm(C°° (M),F*), called the gradient such that 

for all / e: C°°(M), all arbitrarily often differentiable paths tt : R ^ M and alH e R. 
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11.4.3 Example. The space F = R" is convenient, with S consisting of the Euchdean 
norm only. Other examples of convenient vector spaces are Hilbert spaces and Schwartz 
spaces; see Kriegl & MiCHOR [142]. 

11.4.4 Definition. Let E be a differential geometry with Lie algebra W of vector fields. 

(i) A point is a algebra homomorphism from E to C which maps 1 to 1. We write M(E) 
for the set of points, and say that the point ^ maps the scalar field / to the value f{^) of 
/ate. 

(ii) If F is a convenient vector space, an F-chart is a homomorphism C from E to C°°{U (C), C 
for some nonempty, open subset U{C) of F. 

(iii) An F-manifold is a differential geometry satisfying the axioms 
(Ml) If /(e) = for all ^ e M(E) then / = 0. 

(M2) For aU charts C and aU x e U{C), there is a unique ^ = ^c{x) G M(E) such that 

mc{x))^{Cf){x) for all /eE. 
(M3) For all charts C and all 6 e Der C^{U{C)), there is a unique 6 e Der E such that 

miO-O foralU^ec(f/), 
(5f){^c{x))^{SCf){x) forallxeC/. 

Informally, property (Ml) says that there are sufficiently many points to separate scalar 
fields. It implies not only that E is commutative, since {fg—gf){x) = f{x)g{x)—g{x)f{x) = 
for all points x, but also excludes many other commutative algebras, such as nontrivial 
quotients of the algebra of polynomials in a single variable. 

(M2) expresses that charts arc sufficiently large to represent scalar fields locally, and (M3) 
says that there are sufficiently many dervations to reduce differentiation locally to charts. 



11.5 Manifolds as topological spaces 

11.5.1 Definition, (i) Let F be a convenient vector space. A manifold modeled on F 
(short F-manifold or simply manifold^-*^ if F is apparent from the context) is a set M 
whose elements are called points together with a family C of maps ^ : U ^ M from a 
(^-dependent) nonempty open subset U of F to M called chcirts, with the properties 

(SMI) Every point of M is in the range ^[U] of some chart ^ : U ^ M; 

^^More precisely, this defines arbitrarily often differ entiable, real manifolds whose dimension need not be 
finite. There are a number of other notions of a manifold which make somewhat different assumptions. 
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(SM2) A map ^ : [/ — > M is in C if and only if is injective and, for every nonempty open 
subset V oiU and every chart ^' : U' ^ M in C with ^[V] C ^'[U'], 

r'eiy eC~(y,F). 

The manifold is canonically a topological space by declaring as open sets arbitrary unions 
of finite intersections of ranges of charts. 

(ii) The inverse of a chart is called a local coordinate system. An atlas is a family of 
charts whose ranges cover M; the family C of all charts is the universal atlas. 

(iii) The dimension of F is called the dimension of M. In particular, M is called finite- 
dimensional (d-dimensional) if dim F < oo (resp. dimF = d). 

(iv) A mapping F from M to convenient some vector space U is called smooth (or infinitely 
differentiable) if F{^) e C'^{U,V) for every chart ^ : U M. A scalar field on M is a 
smooth complex-valued function on M; the algebra of all scalar fields on M with pointwise 
operations is denoted by C°°(M). A derivation on M is a mapping 6 G LinC°°(M) 
satisfying 

S{fg) = {Sf)g + f{Sg) for all f,ge C~(M). 
Thus a derivation on M is an element of Der C°°(M), which we also denote by Der M. 

(v) A canonical differential geometry whose scalar fields form the algebra E = C°°(M) of a 
manifold M is called a differential geometry of M. 

Note that a chart ^ is injective, hence its inverse the corresponding local coordinate 
system is well-defined on the range of the chart, and maps a nonempty, open subset of M 
to an open subset of F. In many treatments of differential geometry, the local coordinate 
system rather than ^ is called the chart. 

11.5.2 Proposition. 

(i) The set C°°(M) of all scalar Gelds on M is a commutative *-algebra under pointwise 
multiplication. 

(ii) The set LicM := Der C°°(M) of all derivations on M with the commutator as Lie 
product is a Lie algebra. 

Proof. This is left to the reader as a straightforward exercise. □ 

11.5.3 Theorem. ¥-Manifolds in the sense of Definition 11.4.4(iii) and ¥-manifolds in the 
sense of Definition 11.5.1(i) are equivalent concepts. 

Proof. □ 

The motivating example defining the terminology is the surface of the earth, the globe, 
which may be regarded as a 2-dimensional manifold M with F = R^^^, the vector space 



11.5. MANIFOLDS AS TOPOLOGICAL SPACES 



233 



of 2-dimensional row vectors^^. Here the domain C/ of a chart ^ : [/ — > M may be viewed 
as the paper on which the chart (a road map, say) is printed. The important points 
q = {x,y) & U correspond to Cartesian coordinates of marks on the road map labeled by 
towns, ^{q) denotes the location of the corresponding town on the globe. Note that our 
charts may have domains which are not neatly cut and may be disconnected or unbounded. 

The simplest examples of manifolds are open, nonempty subsets of M". 

11.5.4 Example. We consider the concrete case where smooth manifolds modeled on the 
vector spaces M'^ for some d E N are embedded into a bigger vector space M", and where 
membership in the manifold is characterized by m equations Fk{x) = {k = 1, . . . , m). For 
example, a d-sphere is the set of points x e R'^+^ satisfying the single (m = 1) equation 
F{x) :— x'^x — 1 = 0, where the superscript T denotes the transpose. 

Let Mq be an open subset of M" and let F E C°°(M.q, W^). The gradient at a; of F is given 

by 

dF(x) = F'(xf e R"^"^ . 

This generalizes the traditional terminology for the case where m — 1. The implicit function 
theorem implies that if the gradient has constant rank m, i.e., vkdF^x) — m for all x e Mq, 
then the set M given by 

M = G Mo I F{x) = 0} , 

is a (i-dimensional manifold with d = n — m. If M defines a rf-dimensional manifold given 
by an equation F{x) = 0, then the tangent space at a point x G M is given by 

T^M = {X E K^^'* I X ■ dF{x) - 0} . 

Thus the tangent space consists of those vectors perpendicular to the gradient, that is, the 
tangent vectors at x are tangent to M at x. Hence the name tangent space. The vector 
fields of M are similarly given by 

vectM ^{X E C°°(M,Ki^") | X{x) ■ dF{x) = 0, for all x G M} . 



Given an F-manifold M and an F'-manifold N, we define C°°(M, N) as the set of maps 
A : M ^ N such that if ^ : [/ ^ M is a chart on M and ^' : F ^ N is a chart on N such 
that A{^{U)) C ^'(V) implies (^')"^ o ^ o ^ G C°°(C/, V). A diffeomorphism of M is an 
invertible mapping in C°°(M, M) with an inverse in C°°(M, M); we write Ax for the image 
of a point a; G M under a diffeomorphism A. 

We assume that the identity map on M is in C°°(M, M) and that the composition of 
/ G C°°(M,N) and g E C~(N,N') is in C°°(M,N'), a condition^^ automatically satisfied in 

^^It is convenient to think of points as row vectors; then tangent vectors are row vectors, too, and 
gradients of scalar fields are naturally column vectors. Thus later expressions like the directional derivative 
Xdf of a scalar field / in the direction of a vector field X have a natural interpretation as "scalar = row 
times colums" in terms of ordinary matrix algebra. 

^^In technical terms this says that the modeling vector spaces should admit a category of smooth mani- 
folds. 
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finite dimensions. Then the set Diff M of all diffeomorphisms is a group under composition 
of maps. Additional conditions are needed to ensure that Diff M is a vect M-manifold and 
hence (in the terminology ofSection refs.Liegroups below) a Lie group; see, e.g., Neeb [179], 
where one can find a detailed discussion of pathologies that can arise in infinite-dimensional 
Lie groups. 

We define a motion on M as a mapping A e C°°([0, l],Difr(M)) such that ^(0) = 1 is 
the identity. The intuition is that the points x — A{Q)x of an object (subset of M) at time 
t — 0, the start of the motion, is moved by the motion to the point A{t)x at time t e [0, 1], 
ending up in A{l)x at the end of the motion. For every motion A and for all t e [0, 1] we 
define a vector field A{t) on M by 

A{t)df : X ^ ^f{A{s)A{t)-'x) 

as s=t 

for all scalar fields / and all x e M. Since the product rule holds for smooth functions 
in C°°(M), the object A{t)d is indeed a derivation on M, and hence A{t) is a vector field. 
Prom the definition of A{t) we get the chain rule: 

j^f{A{t)x) = A{t)df{A{t)x) . 

If we are only interested in what happens in an infinitesimal neighborhood of a point x G M, 
the vector fields in 

N{x) := {Xo e vect M | Xodf{x) = for all / e C°°(M)} 

have no effect at x. Since N{x) is a vector space, we can form the quotient space 

T^M = vect M/N{x), 

called the tangent space or tangent (hyper-)plane at x. We denote with 

X{x) :^X + N{x), 

the equivalence class of X with respect to the equivalence relation X r^Y <^ X—Y e N{x). 
We call the equivalence class that contains the vector field X the tangent vector of X at 
the point x. 

The union TM of all T^M. is naturally a manifold called the tangent bundle of M. 



The Lie derivative in the traditional approach. In the special case where E = C°°(M) 
for some finite-dimensional manifold M, there is an alternative, traditional route to the 
calculus on manifolds, using the following traditional definition of the Lie derivative. 

For any vector field X, the initial value problem 

m-x, ^e(r)=X(e(r)) (11.36) 
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is solvable for every x G M, for r in some x-dependent neighborhood of zero. This follows 
from the standard theory of ordinary differential equations, since differentiable vector fields 
are locally Lipschitz. 

We denote by e'^^ the local diffeomorphism which maps x into the value ^ (r) of the solution 
^ of (11.36). Clearly, e° = 1 is the identity, but for fixed r, the map e'^^ need not be defined 
everywhere. The latter is the case only when (11.36) is solvable for all r G M; in this case, the 
vector field is called complete, and the e"^^ form a 1-parameter group of diffeomorphisms. 
In general, we have, on the domain of definition, 

-^e-^a; = X(e-^a;). 
dr 

We define the directional derivative Lx4> of a tensor field 4> with respect to the complete 
vector field X by 

{Lx^){x) := ^He^^'x) . 

CLT t=0 

This defines a linear differential operator Lx mapping tensor fields to tensor fields of the 
same type [c, r], called the Lie derivative of X. Clearly, 

(e^-^^0)(x) = 0(e^^x). 

It is not difficult to show the chain rule 

^e(r) = X(r)e(r) => ^mr)) ^ Lx^rMir)) 

for every smooth path x : [0, 1] — > M. Here ^(r) is the tangent vector of the path at $,{t). 
The chain rule implies the product rule (11.15) for the Lie derivative. Therefore Theorem 
11.2.3 implies that the traditional concept coincides with our algebraic concept when E is 
the algebra of scalar fields of a finite-dimensional manifold. 

In the infinite-dimensional case, this approach can also be carried through, although it 
requires considerable technicalities to account for the fact that initial-value problems for 
differential equations in infinite-dimensional spaces are not always solvable. For details, see 

KRIEGL & MiCHOR [141]. 



11.6 Noncommutative geometry 

In this short section, we indicate how things generalize to noncommutative geometry, with- 
out giving details; the reader not familiar with the notions used may simply skip the section. 

In noncommutative geometry, position measurements are limited by uncertainty relations. 
The notion of a point therefore loses its meaning, and the evaluation of functions and vectors 
at a point is no longer well-defined. Thus, in noncommutative geometry, a manifold of points 
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no longer exists, but in place of C°°(M) one has a noncommutativc algebra E whose elements 
behave in a way analogous to scalar fields. All constructions based only on this algebra 
rather than a manifold generalize in an appropriate way to the noncommutativc situation. 
Thus most geometric notions extend formally, but they can be matched with true geometric 
concepts only in certain commutative subalgebras. The basic observation is that a point 
evaluation is a *-homomorphism of C°°(M) to C, and conversely, all such homomorphisms 
of C°°(M) are obtained as point evaluations. Now, if Eq is a commutative normed *- 
subalgebra of E whose completion is a i?*-algebra (a term we shall not further use, and 
hence not introduce formally ) then one can reconstruct on Eq a topological space Mq by 
calling the characters of Eq points; the S*-algebra is then canonically isomorphic to the 
algebra of bounded continuous functions on Mq. If Eq admits sufficiently many derivations 
then Mq is a (smooth) manifold. When Ei and E2 are two such commutative subalgebras 
that do not commute, then, in contrast to the commutative situation, the corresponding 
manifolds Mi and M2 are not naturally embedded into a bigger manifold. Thus there may 
be many maximal manifolds embedded in a single noncommutative geometry. 



11.7 Lie groups as manifolds 

This section defines Lie groups in full generality. Differential equations defining the flow 
along a vector field naturally produce Lie groups and the exponential map, which relates 
Lie groups and Lie algebras. 

11.7.1 Definition. 

(i) A Lie group is a group G which is at the same time a manifold, such that multiplication 
and inversion are arbitrarily often differentiable. A Lie group is both a manifold and a group 
and the two structures are compatible. The identity element in a Lie group will always be 
written as 1. 

(ii) We canonically embed G into Diff(G) by associating to A e G the map B — > AB, which 
is a diffeomorphism. For the definition of the Lie algebra associated with a Lie group, it 
is important to know that the group G acts on C°°{G) by right multiplication, that is, to 
every A e G we associate the map Ra ■ C°°{G) C°°{G) given by 

{RAv)iB) -.^ipiBA) 

for ell B e G, if e C°°(G). Of course, the group also acts by left-multiplication on C"^(G) 
but this action is not directly related to the Lie algebra. 

(iii) The Lie algebra vect G contains the set 

log G ^ {X e vectG \ Rb ^ Xd ^ for all B e G} 
of inveiriant vector fields. 

It is not difficult to show that every Lie group in the above sense is a Lie group in the sense 
of Definition 8.3.2, since G is canonically embedded into LinC°°(G). The converse is also 
valid but a bit more difficult to estabhsh. 
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11.7.2 Proposition. The invariant vector fields logG form a Lie algebra. 

Proof. To check the statement, we only need to show that the Lie product X ~\ Y oi two 
invariant vector fields X and Y is invariant. But this follows from Proposition 8.2.4 since 
the invariant vector fields form the centralizer of the set {Ra \ A e G}. □ 



11.7.3 Proposition. For any smooth motion A e C°°([0, 1], G), where G is identified with 
a subset of Diff(G), the vector field A{t) is an invariant vector field. 

Proof. We know that A{t) is a vector field. Hence we need to check that A{t) Lie commutes 
with Rb for all B e G. For arbitrary B eG and (p e C°°{G) we have 

{RB^A{t)dMA{t)C) = RBiAit)dcp)iAit)C) - Ait)diRB^)iA{t)C) 

= A{t)d^{A{t)CB) - j^{RB^){Am 

= j^ipiA{t)CB)-j^ipiAit)CB)^0. 

□ 



11.7.4 Remarks. Note that an essential ingredient in the above proof is that the action 
of G on C°°{G) is defined from the right and the action of the vector field A[t) from the 
left. 

11.7.5 Definition. A motion A{t) is called a uniform motion if there exists a unique 
/ e logG such that 

A{t) = fA{t) for all t G [0, 1]. (n.37) 

In this case we write for the group element A{\) and call it the exponential of /. 
Conversely, / is called the infinitesimal generator of the motion. 

Formula (11.37) is a linear differential equation with constant coefficients; the initial condi- 
tion A(0) = 1 is already part of the definition of a motion. In finite dimensions, such initial 
value problems are uniquely solvable; in infinite dimensions, unique solvability depends on 
additional conditions. It is easy to check that a uniform motion with infinitesimal generator 
/ e logG is given by A{t) = e*^ . 

11.7.6 Example. In any associative algebra, the set of invertible elements is a group. In 

many cases, the group of invertible elements is a Lie group. In particular, the group 
GL(n, K) of all invertible n x n-matrices over K = MorK = Cisa Lie group, since it is 
the open set of points in K"^" where the determinant does not vanish, so that any point 
has an open neighborhood on which the identity is a chart. We can choose coordinates Xij 
for I < i, j < n and then GL{n, K) is the open set where det{xij) ^ 0. Any derivation is of 
the form 
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for all / e C°°(GL(n,K)) and for some X'^ e K. One finds that logGL(n,IC) = gl{n,K) 
is the Lie algebra of all n x n-matrices over K. It is easy to verify these properties by 
describing everything with matrices. The subgroup of GL{n,L,) consisting of the matrices 
with unit determinant is denoted by SL{n,K). In other words, SL{n,K) is the kernel of 
the map det : GL{n, K) — > K*, where K* is the group of invertible elements in K. The Lie 
algebra of SL{n, K) is denoted by sl{n, K) and consists of the traceless nxn matrices with 
entries in K. 



Chapter 12 

Conservative mechanics on manifolds 



We consider closed 2-forms in manifolds and their associated Poisson algebras. This natu- 
rally leads to symplectic geometry and a symplectic formulation of the dynamics of quan- 
tum mechanics. It also leads to classical Hamiltonian and Lagrangian mechanics, including 
constraints. 



12.1 Poisson algebras from closed 2-forms 

In general, a classical, conservative dynamical system is described in terms of motion on 
a manifold M, called the phase space, such that some algebra of functions on it has a 
Poisson algebra structure; more precisely, C°°(M) is equipped with a Lie product "i that 
is antisymmetric and satisfies the Jacobi identity and the Leibniz identity. Such a manifold 
is called a Poisson manifold; see, e.g., Vaisman [248] or da Silva & Weinstein [61] 
Poisson manifolds provide a general setting for the study of the dynamics of classical con- 
servative mechanical systems by differential geometric methods; for a more comprehensive 
discussion of different aspects see Marsden & Ratiu [164], Ratiu [202] and Morrison 
[175]. 

Every symplectic manifold is a Poisson manifold since the symplectic structure gives rise 
to a natural Poisson bracket. In the symplectic case, a Hamiltonian is a function of some 
coordinates Qi and the conjugated momenta pj. In such cases, the phase space is even- 
dimensional. In more general cases described, e.g., by Lie- Poisson algebras, the phase 
space need not be a symplectic manifold. Indeed, symplectic manifolds arc always even- 
dimensional while the manifold 5*0 (3) of the spinning top (see Section 9.3 and Section 9.2) 
has dimension 3. 

Many Poisson algebras of relevance in classical mechanics may be constructed via a uniform 
construction based on a closed 2-form characterizing the kinematics of the system of interest. 
The description is then completed by specifying the dynamics through a Hamiltonian in 
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the resulting Poisson algebra, and by selecting an initial state describing the preparation 
of the system. In this section, we discuss the general construction principle. 

Let a; be a closed 2-form on a differential geometry E. We call a scalar field / e E 
compatible with u; if there is a vector field Xf such that 

df = Xfuj; (12.1) 

any such Xf is called a Hamiltonian vector field associated with /. We write E(u;) for 
the set of all scalar fields / G E which are compatible with ui. In general, Xf need not exist 
for all /, and if it exists, it need not be unique. Thus E(a;) may be a proper subspace of E; 
this situation is typical for examples arising from constrained Hamiltonian mechanics. 

12.1.1 Proposition. Let lv be a symplectic form. Then every scalar field f is compatible 
with cu, 

Xf = dfu;-\ (12.2) 

and E{u) = E. 

Proof. Since c<j is a symplectic form, uj is nondegenerate and has an inverse satisfying 
(11.12). The defining condition for Xf can therefore be solved uniquely for Xf, for all 
/ e E, resuhing in (12.2). □ 



A vector field X is called locally Hamiltonian (with respect to uj) if the linear form Xoo 
is closed, and Hamiltonian (with respect to a;) if Xuj is exact (and hence closed). Thus, 
for any / G E(ci;), the vector field Xf is Hamiltonian with respect to cu, 

12.1.2 Proposition. If X,Y arc locally Hamiltonian vector fields with respect to the 
closed 2-form uj then X is Hamiltonian, and 

{X^Y)uj^d{XYuj). (12.3) 

In particular, the locally Hamiltonian vector fields and the Hamiltonian vector fields form 
Lie subalgebras of W. 

Proof. Since u and Xu are closed, (11.26) implies that Lx^^ = Xduo + d{Xoo) = 0. Again 
by (11.26), d,(XYLo) = Lx(Ylo) - Xd(YLo) = Lx(Ylj) = (LxY)lo + YLxu = {LxY)lj = 
{X "1 Y)uj, using the closedness of Yu and the product rule (11.15). This proves (12.3). 
The concluding statement is an immediate consequence. □ 

12.1.3 Theorem. For every closed 2-form uj over the manifold M, the set E(a;) is a Poisson 
algebra, with Lie product given by 

fng:= Xfdg = XfX,uj = -X.XfU = -X,df. (12.4) 

A Hamiltonian vector field associated with f ~\ g is given by 

Xf^,:=XfnX,. (12.5) 
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In particular, if uj is a symplectic form then 

f^g = dfuj-^dg. (12.6) 

Proof. We first show that E(a;) is a subalgebra of the algebra E. U f,g & E(a;) and A e C 
then A/, f ± g, fg E K{uj) since we may take 

Xxf = XX f, Xf±g ^Xf±Xg, Xfg = fXg + gXf. 

We next show that f ~> g is well-defined. Indeed, if Xj, X'^ are two Hamiltonian vector fields 
associated with / then ijj{X'j — Xf,Y) — hence f ~^ g does not depend on the choice of 
the Hamiltonian vector fields associated with / and g. 

Proposition 12.1.2 implies that (12.5) is a Hamiltonian vector field for f ^ g] therefore 
/n^eE(a;). 

The operation ~i defined by (12.4) is bilinear, antisymmetric, and satisfies the Leibniz 
identity. To conclude that E(c<;) is a Poisson algebra it therefore suffices to show that the 
Jacobi identity holds. This follows since, with X :— Xf,Y :— Xg, 

{fng)nh = Xf,gdh= {Xf^ Xg)dh = {X nY)dh = Lx.Yh 

= [Lx, LY]h = LxLyh - LyLxh = Xfd{Xgdh) - Xgd{Xfdh) 
= f^i9^h)-g^{f^h) = {f^h)^g + fn{g^h). 

Finally, if a; is a symplectic form, (12.2) implies that the Lie product (12.4) can be rewritten 
in the form (12.6). □ 

Note that the Lie product can be extended to the case where one argument is in E(a;) and 
the other may be an arbitrary quantity from E: 

f-^g = Xfdg for / e E(a;), ^ G E, 

f^g^-Xgdf for/eE, g eE{u;). 

Thus if / is compatible with cu, the Lie product is defined even when g is not compatible 
with cu. 

In the manifold case, the above theorem defines, for each closed 2-form u; on an F-manifold 
M, a Poisson algebra E(a;) which is the set of functions / e E = C°°(M) which are 
compatible with u. In the special case where u is symplectic, we have seen that E(a;) — E; 
thus we may define the Poisson bracket 

{f,g}:^g^f^dgu;-'df (12.7) 

of f,g E E. This is the traditional Poisson bracket associated with the symplectic space 
{M,uj). 
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The affine functions, which map ^ e M to + 7 for some u e¥ and some 7 G C, satisfy 

(^M + 7) ^ i^v + 7') = vw-lv e C, 

hence form a Lie subalgebra, which is a Heisenberg algebra. This provides a faithful classical 
Poisson representation of general Heisenberg algebras. 

12.1.4 Example. We continue the discussion of Example 11.1.3, where scalar fields 
(rcsp. vector fields) are the smooth complex-valued (resp. row vector valued) functions on 
a nonempty, open subset M of the space M^" of rovectors of length n. In this case, it is 
natural to identify W* with the vector space C°°(M, C") of covector- valued fields via 

{XO{x)^X{x)ax) 

for X e W = C"^(M, C^") and ( e W* = C°°(M, C"). In particular, the gradient df = df 
appears naturally as an element of W*, consistent with our abstract development. 

Now let 6' be a distinguished linear form. Then we can define its Jacobian, the x-dependent 
square array 89 whose entries are the partial derivatives 

dAi-) - ■ 

We now consider the exact 2- form uj = —d6 (the minus sign is traditional). We have 

YXuj = Yd{Xe) - Xd{Ye) + (X n Y)e for uj = -dO (12.8) 

since YXu = -YXdO = -Y^LxO - d{Xe)) = -YLxO + Yd^XO) = -LxiYO) + {LxY)e + 
Yd{Xe) by (11.31) and (11.18). It is not difficult to show that now 

(Xu;)(x) = J]X^(a;)u;,-fc(a;), (12.9) 

where 

u^k{x) = dkOjix) - dj9k{x) (12.10) 
are the components of the antisymmetric expression (dO)'^ — 89 in the Jacobian of 9. 

As a consequence, uj = —d9 is nondegenerate precisely when (89)^ — 89 is nonsingular. 
If this holds, a; is a symplectic form, and all our results apply. This matches the present 
development with that found in standard treatises such as Marsden & Ratiu [165]. 



12.2 Conservative Hamiltonian dynamics 

We now apply the results of Section 12.1 to classical Hamiltonian mechanics of conservative 
systems. The phase space of a classical system is the set of all states that may be attained 
in some realization of the system. Wc begin with the unconstrained case, where the phase 
space is a cotangent bundle over a manifold M, and than extend the discussion to the 
constrained case, where the phase space has a more complicated structure. 
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To avoid technicalities, we only treat the case where the manifold can be described by a 
single chart, so that it can be treated as an open subset of some topological vector space. 
However, using standard techniques from differential geometry, it is not difficult to lift the 
discussion to arbitrary manifolds. Thus, in the following, the configuration space 
is a nonempty, open subset of a convenient vector space F over R. Thinking of as a 
chart of a general manifold, everything we say here extends in a standard way to arbitrary 
F-manifolds in place of Mc. 

We write the bilinear pairing between elements q from F and elements p from the dual space 
F* as product p ■ q = q ■ p. We extend this product linearly to the compexifications CF of 
F and CF* of F*, and extend it further pointwise to CF- valued or CF*- valued functions. 

In this section, we consider the case of unconstrained dynamics Here E = C°°(M) and 
W = C°°(M, CF X CF*) are the spaces of scalar fields and vector fields, respectively, on the 
cotangent bundle M = T*Mc := Me x F* of M^. 

The reader may think of the Euclidean space F = F* = of vectors with n real components 
and bilinear pairing p ■ q = 'YlikP^qk- As discussed in Section 2.2, this accounts for the 
mechanics of point particles. For field theories, F is an infinite-dimensional function space. 

A classical, conservative, unconstrained mechanical system is defined by a Hamil- 
tonian if G E and considering the full cotangent bundle M as the phase space of the 
system. The point a; = (g, p) G M is called the state with position q G Mc and mo- 
mentum p G F*. The energy of the system in the state {q,p) is the value H{q,p) of the 
Hamiltonian at {q,p)- 

The state of the system varies with time t, which we consider to be a number in the 
interval [t , t\ , where t is the initial time and t > tis the final time for which the system 
is considered. The time dependence is modeled by a trajectory, a state-valued, arbitrarily 
often differentiable function of time, mapping t G [t,t\ to (p(t), q{t)) G M. The position q{t) 
and the momentum p{t) at time t are constrained by the Hamilton equations in state 
form. 



Here dp — 8/ dp and dq = d/dq denote the gradient with respect to momentum p and 
position q, respectively. Note that if / is a scalar field then dpf{q,p) G CF and dqf{q,p) G 
CF*. 

The Hamiltonian equations automatically imply the conservation of energy: -^H{q,p) — 
dqH ■ q + dpH • p = 0. 

The Hamiltonian equations may be derived from a veu-iational principle. We define the 
action as the functional on smooth paths in M defined by 




(12.11) 




(12.12) 



and consider small variations 6q and 6p of the arguments q and p, respectively. Since we do 
not make further use of the principle, we assume without the discussion that the integral 
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can be manipulated as accustomed from the finite-dimensional case, where F = R". For 
variations vanishing at t — t and t — t, we have, up to higher order terms, 

I{q + Sq,p) - I{q,p) ^ dt {p{t) ■ 6q{t) - d,H {q{t) , p{t)) ■ 5q) 

= j'^dt (^-p{t)-6q{t)-d,H{q{t),p{t))-6q), 

I(q,p + 5p) - I(q,p) ^ J'dt (5p(t) ■ q(t) - dpH (q(t) , p(t)) ■ Sp^ 

so that the path {q,p) is a stationary point of the action if and only if the extended 
Hamiltonian equations (12.11) hold. 



A vector field X e W is a pair of functions X = (X*, Xp) e C°°(M, CF) x C°°(M, CF*); its 
value at the state {q,p) is X{q,p) — {X'^{q,p),X'P{q,p)). Associated with each vector field 
X is the derivation Xd defined by 

Xdf:^X'i-d,f + XP-d,f. 

Using the mapping d defined in this way, it is easily checked that we have a commutative 
differential geometry. In particular, a general linear form C, is described by a pair of functions 
(Cg, Cp) e C^(M, CF*) X C°°(M, CF) such that 

XC^X^-C, + XP-Cp. (12.13) 

12.2.1 Theorem. Let 9 — (p, 0) be the linear form defined by 

{X9)iq,p):=X^iq,p)-p. (12.14) 

Then cu — —d9 = is an exact symplectic form satisfying 

{YXu;){q,p) := X'i{q,p) ■ YP{q,p) - Y%q,p) ■ XP{q,p) (12.15) 

for arbitrary vector fields X, Y. Its inverse satisfies 

X^C^-' ^ XI ^Cp, XP^-Q. (12.16) 

for arbitrary hneai forms C,. With the Lie product 

fng:^ dfuj-'dg = dpf ■ d,g - d^g ■ dj, (12.17) 

the algebra E = C°°(M) of scalar fields on phase space M is a Poisson algebra, n , co, 
and 9 are called the canonical Lie product, the canonical symplectic form, and the 
canonical linecir form^ on phase space M. 



"'^In the notation using components and the Einstein summation convention, wc have 9 = pjdcf and 
CO = dq^ A dpj. Here the hnear forms dq^ and dpj, given by Xdq^ := {X'^y and Xdpj := {XP)j, are the 
gradients of the functions q^ and pj mapping a general state {q,p) to the indicated components. 
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Proof, uj is an exact 2-form since uj = d{—6). To prove (12.15), we use (12.8) to work 
out {YXuj){q,p) = Yd{Xd) - Xd{Ye) + (X n Y)e = Yd^X^p) - XdiY^p) + (X n Y^p = 
Yidq{X'ip) + YPdp{X'ip) -Xidg(Y'ip)-XPdp{Yip) + {Xn Y)ip. Using {X n Y)i = XdYi- 
YdXi = XqdqYi + XpdpYi - YgdgX" - YpdpX\ which follows from (11.4), the product 
rule, and dpp — 1, everything cancels except for Y'^X'^ — X^Y'^. This proves (12.15). By 
comparing (12.13) with (12.15), we see that 

C = Xuj ^ C, = -X^\ Cp = X'^. (12.18) 
Thus Lu maps X = (X'^^X^) to (— X^,X^), corresponding to right multiphcation by the 
matrix 0^) ' ^^^^^ equations (12.18) are uniquely solvable for X by (12.16), we 

conclude that uu is nondegenerate, hence symplectic. 

Since every exact 2-form is closed. Theorem 12.1.3 applies and gives the final assertion. 

□ 



Using the chain rule, the dynamics (12.11) is easily seen to be equivalent to the Hamilton 
equations in general form, 

f = ft=H^f; (12.19) 

cf. Chapter 2.2. 



12.3 Constrained Hamiltonian dynamics 

In the constrained case, additional parameters (e.g., Lagrange multipliers) arc needed to 
describe the possible states of the system. Therefore we take E = C°°(M x U) and W = 
C°°(M X U, CF X CF* X CU), the spaces of scalar fields and vector fields, respectively, on 
an augmented cotangent bundle M x U of Mc, where, as before, the phase space is 
M = Mc X F, and U is a convenient vector space. 

A classical, conservative, constrained mechanical system is again defined by a Hamil- 
tonian e E. The point x — {q,p,u) e M x U is called the state with position q e Mc, 
momentum p G F, and parameter e U; however, due to the constraints dervied below 
from H, not all points in M x U are physical. As we shall see, the accessible phase space 
may also be smaller than M. The energy of the system in the state {q,p,u) is the value 
H{q,p,u) of the Hamiltonian at {q,p,u). 

The state of the system again varies with time t G [t,t\. The time dependence is modeled by 
a trajectory, a state- valued, arbitrarily often difFerentiable function of time, now defining 
position q{t), momentum p{t), and parameter u{t) at time t. These are constrained by the 
extended Hamiltonian equations, 

q^^^dpH, p=^ = -d,H, O^d^H. (12.20) 
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Here du = d/du denotes the gradient operator with respect to the parameter u. Thus, 
in place of a system of ordinary differential equations in the unconstrained case we now 
have a system of differential-algebraic equations (DAE) involving the holonomic 
constraints 

0^d^H{q,p,u). (12.21) 

Again the extended Hamiltonian equations automatically imply the conservation of en- 
ergy: f^H{q,p, u) = dqH ■ q + dpH ■ p + d^H • it = 0. 

The case where the symmetric Hessian matrix 

G:=dlH{q,p,u) 

is invertible is referred to as the regular case. Then, by the implicit function theorem, 
(12.21) can be solved locally uniquely for u = u{q,p), which implies that (12.20) may be 
viewed as an ordinary differential equation in q and p alone. In the singulcir case where 
the Hessian G is not invertible, the constraints imply restrictions on p. Thus, not the whole 
phase space is dynamically accessible, and the analysis of solvability of the DAE is more 
involved. The details depend on the so-called index of a DAE, index 1 corresponding to 
the regular case, index > 1 to the singular case, and are beyound our treatment. 

12.3.1 Example. We consider the constrained Hamiltonian system with F = and U = 
R, defined by the Hamiltonian 

H{q, p, u) := ^p' + V{k x q) - (k • p)u, 

where ^(E) is a potential energy function. The special case V(E) := ^E^, describes the 
dynamics of a single Fourier mode with wave vector k of the free electromagnetic field. A 
straightforward calculation gives the dynamics 

q = p-ku, p = kxVF(kxq), = k ■ p. 

Since G = d^H = 0, this is a singular case. Indeed, the dynamically relevant part of 
the phase space is characterized by the transversality condition = k • p, wheras the 
multiplier u is completely undetermined by the dynamics. This implies that the dynamics 
of q is determined only up to an arbitrary multiple of k; in other wordss, only k x q is 
determined at all times by the initial conditions. 

Note that "i k-p = k - {H "i p) = k-p = 0, hence the constraint = k-p is automatically 
satisfied at all times if it is satisfied at some time. Thus, in the terminology of constrained 
mechanics, it is called a first class constraint, and gives rise to gauge symmetries. A 
gauge transform replaces q by q + ks(q) with an arbitrary scalar field s(q), and leaves 
everything of f dynamical interest invariant. The gauge invariant quantities are those in 
the centralizer C(k • p) of the constraint. / belongs to the centralizer iff it Lie commutes 
with k ■ p, which is the case iff k ■ dgf = 0, hence iff / depends only on p and k x q. Thus, 
the centralizer consists of all smooth functions of 

B := k X q, E := -p, 

and the dynamics of the gauge invariant quantities is determined by 

B = -kxE, E = kxVy(B), = k • E. (12.22) 
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The extended Hamiltonian equations may also be derived from a variational principle. Now 
the action is defined on smooth paths in M x U, 

I{q,p,u) := dt (p{t) ■ q{t) - H{q{t),p{t),u{t))y (12.23) 

Variations of the arguments show as before that the path {q,p,u) is a stationary point of 
the action if and only if the extended Hamiltonian equations (12.20) hold; the constraint 
equations derive from 

I{q,p,u + 5u) - I{q,p,u) ^ dt duH{q{t),p{t),u{t)) ■ Su^. 

A vector field X e W is now a triple of functions 

X = {X'i,XP,X'') e C°°(M,CF) X C°°(M,CF*) x C°°(M,CU); 

its value at the state {q,p, u) is X{q,p, u) — {X'^{q,p, u),XP{q,p, u),X'^{q,p, u)). Associated 
with each vector field X is the derivation Xd defined by 

Xdf X« • d,f + XP ■ dj + X" • duf. 

It is again easy to check that this defines a commutative differential geometry. In particular, 
a general linear form ( is described by a triple of functions (Cg; Cp; Cu) ^ C°°(M, CF*) x 
C°°(M,CF*) X C°°(M,CU) such that 

XC^X'i-Q + XP-Cp + X-- Cu. (12.24) 

In analogy to the unconstrained case, we define the linear form 6 by 

{Xe){q,p,u) := X'^{q,p,u)-p. 
Thus 6 = {p,0,0) and a similar calculation as before gives the exact 2-form 



cu := -de 



and 

{YXu;){q,p,u) X\q,p,u) ■ Y^{q,p,u) - Y'^{q,p,u) ■ X^{q,p,u). 

Since no differentiation by the parameters u is involved, the 2-form oo is now degenerate 
and hence no longer symplectic. As a result, E(t(;) is strictly smaller than E; a scalar field / 
is found to be compatible with uj and hence in E(a;) only if d^f = 0, i.e., / is independent 
of u. Thus E(c<;) = C°°(M) is again the Poisson algebra of scalar fields on phase space, with 
Lie product (12.17). 

The Hamilton equations (12.19) remain valid, too; note that by the general theory, i7 ~i / e 
E(ci;), although H depends on u. 




248 



CHAPTER 12. CONSERVATIVE MECHANICS ON MANIFOLDS 



If the Hamiltonian H{q,p,u) = H{q,p) is independent of u, everything reduces to what 
we said about unconstrained Hamihonian mechanics. Constrained Hamiltonian mechanics 
with unconstrained Hamiltonian HQ{q,p) and w-independent holonomic constraints C{q,p) ~ 
are obtained by introducing a vector u of Lagrange multipliers for the constraints and 
defining H{q, p, u) — Ho{q, p) + C {q, p) ■ u. Note that duH{q, p,u) — C {q, p) simply recovers 
the holonomic constraints. Thus, we see that the components of u which occur only linearly 
in H behave as multipliers of ^-independent holonomic constraints. 



Note that there is another class of models for conservative Hamiltonian dynamics, defined 
by so-called nonholonomic constraints. There the constrained dynamics is not given 
by (12.20) but instead by 

^ = dpH{q,p), p^^^-dgH{q,p) + A{q)u, ^ dpH{q,p) ■ A{q), 

where A{q) maps a multiplier vector u G U to an element from F, and the Hamiltonian 
H again defines the energy. The energy is conserved since ■^H{q,p) = dqH{q,p) ■ q + 
dpH{q,p) ■ p = dpH{q,p) ■ A{q)u = 0. However, now the dynamics can usually no longer 
be written in terms of a variational principle. Only the special integrable case where 
A{q) — dqC{q) corresponds to holonomic constraints of the form C{q) = and a modified 
Hamiltonian H{q,p,u) :— H{q,p) — C{q) • u, which agrees on the space of trajectories 
with H. The most general conservative Hamiltonian system may have both holonomic and 
nonholonomic constraints; the reader may wish to write down the defining equations and 
generalize the above discussion accordingly. 



12.4 Lagrangian mechanics 

Frequently, and especially in relativistic field theory, a classical system is definied in terms 
of the Lagrangian approach to mechanics. We consider here the autonomous case only, 
where the Lagrangian is time- independent. 

The basic object is now a Lagrangian L e C°°(T'Mc), a function of ponits in the tangent 
space TMc of a configuration manifold Mg. As in the Hamiltonian case, we restrict our 
attention to the case where Mc is a nonempty, open subset of a convenient vector space F 
over M. Then the tangent space is TMc = Mc x F, points in TMc are pairs {q, v) consisting 
of a configuration point q G Mc and a tangent vector v G F at g, referred to as velocity, 
and the Lagrangian is a function with function values L{q,v). 

The Lagrangian approach to mechanics can be represented in the framework of constrained 
Hamiltonian dynamics by taking U = F, and u = v. Then the choice 

H{q,p,v) :^up-L{q,v) (12.25) 

for the Hamiltonian gives unconstrained Lagrangian mechanics. Constrained Lagrangian 
mechanics with holonomic constraints C{q, v) — Ois similarly obtained by taking U = FxUo 
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and u = {v,Uo) and H{q,p,v,Uo) — pv — L{q,v) + u^C{q,v), where uq is a Lagrange 
multiplier. However, in the following, we only discuss the unconstrained Lagrangian case. 

Applying the general machinery of Section 12.3 to (12.25), we find as dynamical equations 
the Euler-Lagrange equations 

g = p = dyL{q,v), p=dyL{q,v); (12.26) 

and the action (12.12) reduces on the submanifold defined by g = v to 

I{q):^ j\tL{q{t),q{t))- (12-27) 

The Hamiltonian (12.25) is time invariant since 

(p-q)' ^p-q+p-q^ Lg-q + Lq-q^ L{q, q)' = L. 

It is easily verified directly that the condition for I{q) to be stationary at the path q gives 
again the Euler-Lagrange equations (12.26); this is usually taken as the starting point of 
the Lagrangian approach. 

12.4.1 Example. The Lagrangian L{q,v) = juiv^ — \kq^ defines the harmonic oscillator, 
as can be seen by writing down the Euler-Lagrange equations. Note that the action need 
not be bounded below, as can be seen from the path q{t) — s{l — t^) in [t, t\ = [—1, 1], where 
/(g) = (4m — j^k)s^ diverges to — oo when k > 7.5m and s oo. Thus, it is inappropriate 
to refer to the stationary action principle as principle of least action, as often done for 
historical reasons. 

If we change a Lagrangian L{q,v) to 

L(q,v) := L{q,v) +vdq(l)(q) 

for some smooth function 0, the action I{q) remains unchanged apart from a boundary term 
arising through integration by parts. As a result, the new equations of motions and the old 
ones are equivalent. On the other hand, the momentum changes from p to p = p + dq(f){q). 
This does not affect the equation of motion in the form (12.19) since the transformation 
from p to p is a canonical transformation leaving the Lie product invariant. Indeed, it is 
not difficult to see that the more general substitution of p' = P + x{q) for p preserves the 
Lie product iff dqx{q) is symmetric. Necessity follows since for constants a, 6 e F, 

a-p' nb-p' = a- x{q)b - b ■ x{q)a 

must vanish, and sufficiency can be established by a more involved computation. 

We may work directly in the tangent manifold and define the linear form 

Ol^P^ dqL, 



9l{X) ■.= Xp, 



(12.28) 
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and the canonical 2-form 

cul = -d6L- 

Then duji — —ddp — imphes that ujl is closed, and hence Theorem 12.1.3 applies. If 
ujl is non-degenerate, we can solve for q in terms of p, q and get the Hamiltonian picture 
in the traditional way. The Poisson algebra becomes the standard Poisson algebra on the 
cotangent bundle. If cul is degenerate, we cannot solve for q and compatibility restricts the 
space ^{cul) of quantities. 

p = e E CF 

is the canonical momentum. 

On E = C°°(TM), any Lagrangian L = L{q,q) defines a Lie product on E(a;L) which 
induces the Euler-Lagrange dynamics defined by the action I — J dtL. 

We rearrange the canonical 2-form as 

cul — dq A dp — dq A {pqdq -\- Pqdq). 

The condition for / e ^{ljl) to be compatible with lu requires the existence of a Hamiltonian 
vector field Xf with 

^^Gdq{Xf), ^^-Gdq{Xf), (12.29) 
with the symmetric Hessian matrix 

G := a^L = dqd.L = ^ (12.30) 

Case 1. In the regular case, i.e., if the Hessian matrix is invertible, we can solve the 
constraint equation p = d^L at least locally for v, getting an equation q — v{q,p). In this 
case, we find from (12.29) that 

where /, g are functions of q and q. Note that 

Lv{q,v{q,p)) =p, 

and 

H(q, p) = pv{q, p) - L(q, v(q, p)) 

has derivatives 

Hp = v{q,p) +pvp{q,p) - L^{q,v{q,p))Vp{q,p) = v{q,p) = q, 
Hq^pvq(q,p) - Lq(q,v(q,p)) - L^(q,v(q,p))vq{q,p) = -Lq(q,q) = -p, 

so that 

-^f(^^P) = fpP + M = -fpHq + fqHp ^Hn f, 
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with the canonical Lie product on phase space. 
Since H = {p\q) — L, we have for solutions q 



dH 
dq 



dH 
dq 



dp 

dq 
dp 

dq 



dp 
dq 

dL 



q] +p 



dp 



^ dq \dq 



T 9p . 



p 



((lb)- (lb)) -(IN)- 



Hence 



and 



dt 



dq 

d 
di 



dq 



d, 



dg_ 
dq 



so that H generates the dynamics. 



Case 2. In the singular case, i.e., when the Hessian matrix (12.30) is not invertible, 
condition (12.1) is nontrivial, not all /(g, q) are compatible with uj and hence in the Poisson 
algebra. Then (12.31) only holds for the generalized inverse and (12.29) requires that the 
partial derivatives are in the range of G. The Poisson manifold (or orbifold?) is the set 
of orbits of the gauge group; cf. M/R p. 325. Restrict E accordingly, as in the symplectic 
case:] 



The resulting Lie product (cf. (12.4)) is 

f^g = dg{Xf) - 



^^dq{Xj) + ^dq{X,). 



;i2.32) 



Note that the standard treatment in terms of symplectic manifolds requires regularity. In 
the singular case, complicated additional assumptions and arguments are needed to bring 
theories with gauge symmetries (which are always singular) into the framework of symplectic 
geometry. 



CHAPTER 12. CONSERVATIVE MECHANICS ON MANIFOLDS 



Chapter 13 



Hamiltonian quantum mechanics 



In this chapter, Hamiltonian quantum mechanics is described in differential geometric, 
classical terms. In particular, this enables one to formulate dynamics for mixed quantum- 
classical systems in which - as in the Born-Oppenheimer approximation in quantum chem- 
istry - slow degrees of freedom are modelled classically, while the fast motion (typically of 
electrons) is modelled by quantum mechanics. 

Also discussed is the relation between classical mechanics and quantum mechanics in terms 
of quantization procedures. 



13.1 Quantum dynamics as symplectic motion 

As a particular case of dynamics in the Poisson algebra of a symplectic form we discuss 
here the dynamics of wave functions and expectations in quantum mechanics. 

We consider the special case of the unconstrained setting of Section 12.2 where Mc = F. 
Then HI = CF = M is a complex Euclidean space in which we may do quantum mechanics. 
The isomorphism between H and M as real vector spaces is made explicit by writing 



i> = q + LP e CF, 



(13.1) 



where 



(13.2) 




Then 



(13.3) 



and the Hermitian inner product in EI is 



= (f) ■ ip. 



(13.4) 
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We may regard arbitrary smooth functions of ijj and ijj as functions of q and p by writing 
(with shght abuse of notation) 

fiil^,^)^fiq + ip,q-ip). (13.5) 
The chain rule then imphes the relations 

d^ + d^ = dg, (13.6) 

i{d^ -d^)^d, (13.7) 

for the partial derivatives, using these, it is an easy matter to rewrite the Lie product 
(12.17) in the form 

f^9 = L{U-gf-g^-ff)- (13.8) 

Now we consider the classical Hamiltonian 

//e(V',?):= rH^p, 

where H e LinH is a quantum Hamiltonian. Then we find 

^ = Hc^ip = -lH^, (13.9) 
giving the Schrodinger equation 

ihi) = (13.10) 

as classical Hamiltonian equation of motion for the state vector ip eM.. Thus, quantum 
mechanics may be discussed in a classical framework. The variational principle for classical 
Hamiltonian systems discussed in the context of (12.12), rewritten for the present situation, 
is called the Dirac-Frenkel variational principle (DiRAC [63], Frenkel [82]). The 
action takes the form 

i{^3)-= dtr{t){ih--H))^{t); (13.11) 

setting its variation to zero indeed recovers (13.10). The Dirac-Prenkel variational principle 
plays an important role in approximation schemes for the dynamics of quantum systems. 

In many cases, a viable approximation is obtained by restricting the state vectors ■?/'(t) to a 
linear or nonlinear manifold of easily manageable states 1^;) (for example coherent states) 
parameterized by classical parameters z which can often be given a physical meaning. 
Inserting the ansatz ■0(t) = \z{t)) into the action (13.11) gives an action for the path z{t), 
and the variational principle for this action defines an approximate classical Lagrangian 
(and hence conservative) dynamics for the parameter vector z{t). Thus, the Dirac-Frenkel 
variational principle fits in naturally with the interpretation in Section 7.4 of the parameter 
vectors characterizing a state as the natural observables. An important application of this 
situation are the time-dependent Hcirtree-Fock equations which are at the heart of 
dynamical simulations in quantum chemistry. 

We note that is a constant of the motion, hence we may restrict the dynamics (13.10) 
to normalized state vectors satisfying = 1. In this case, we may interpret the 
function Ac G E defined for A e LinH by 

Ae(^,^) ■.= rA^ = {A) 
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as the classical value of the quantity A in the pure state defined by the normalized state 
vector ip, or, equivalently, by the rank one density matrix 

p^i/ji/j*. (13.12) 

The Lie product of two values is again a value, since one easily calculates 

{A)n{B) = {t[A,B]) = {AnB), (13.13) 

where the Lie product on the right hand side is the quantum bracket. In particular, the 
dynamics of the values is given by the Ehrenfest equation 

j^{A) = {H) n {A) = {HnA)= '-{[H, A]). (13.14) 

In the special case, where H — T(p) + V{q) is expressible as a sum of a kinetic energy 
operator T{p) depending on a momentum vector p and of a potential energy operator V{q) 
depending on a position vector q, whose components are operators satisfying the traditional 
canonical commutation rules 

qj-\q + k^Pj^Pk^ 0, pj ^qk^ Sjk, 
the special cases of the Ehrenfest equation, 

= {d,H{q,p)) = {d,T{p)), = -{d,H{q,p)) = -{d,V{q)), 

often called the Ehrenfest theorem, are due to Ehrenfest [68]. The Ehrenfest equation, 
here derived in the Schrodinger picture, is valid also in the Hciscnberg picture (or even 
more general interaction pictures); the dynamical objects of physical interest are neither 
the states nor the quantities, but the values. We may also compute the dynamics of the 
density matrix (13.12), and find the Liouville equation 

ihp^[H{p,q),p]. (13.15) 

More generally, it is not difficult to check that taking (13.13) as a definition of the Lie 

product of values in arbitrary states (not necessarily pure states as in the above derivation) 
indeed turns the family of (^4) with (•) ranging over states defined by 

{A)^trpA (13.16) 

for some strongly integrable density matrix p and A ranging over the elements of Lin H into 
a Lie algebra. Therefore, the Ehrenfest equation is vahd for arbitrary states, not only for 
pure states. By inserting (13.16) into the Ehrenfest equation and comparing coefficients, 
one also sees that the Liouville equation (13.15) remains valid. 



13.2 Quantum-classical dynamics 

There are many systems of practical interest which are treated in a hybrid quantum-classical 
fashion. The most important example is the Born-Oppcnheimer approximation, where 
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nuclei are treated classically, while electrons remain quantized. Another truly quantum- 
classical system is a quantum Boltzmann equation with spin; here the spin is still an 
operator, represented by 4 x 4 matrices parameterized by classical phase space variables. 
On the other hand, the quantum-Boltzmann equation for spin zero is already a purely 
classical equation, since its dynamical variables are all commuting. 

In the Liouville picture, where the density matrices are the dynamical variables, the basic 
equations for a large class of quantum-classical models are the generalized Liouville 
equation 

ihp^ [H(p, q),p\, 
and the generalized Hamilton equations 

q = ti pdpH{p,q), 

p ^-tTpdqH{p, q). 

Here H G C°°(M, LinH) is an operator valued function on a classical phase space M. Thus 
H{p, q) is, for any fixed vectors p, q, a linear operator on some Euclidean space H, the 
density matrix p = p{t) is a time-dependent trace-class operator on H, and q = q{t),p = 
p{t) are classical, time-dependent vectors, not quantum objects. The classical quantities 
are the functions of the values 



where / is a (p, gj-dependent operator on H. Expressed in terms of values, we have 



which looks like the Ehrenfest theorem, except that on the left hand side we have classical 
variables and no expectations. The equations are conservative equations for the evolution 
of values (the value (H) of the energy is conserved); dissipative systems and stochastic 
systems can be also modelled, but this is beyond the scope of the present exposition. 

The quantum-classical dynamics preserves the rank of the density p. In particular, if p has 
the rank 1 form 



at some time, it has at any time the form (13.17) with time- dependent ip. The fact that p 
has trace 1 translates into the statement that the state vector ip is normalized to ■?/'*■?/; = 1. 
One easily checks that the Liouville equation holds iff the state vector psi, determined by 
(13.17) up to a phase, satisfies the Schrodinger equation 




p — ipip* 



(13.17) 



ihtp — H{p, q)il). 



In terms of the state vector, values take the familiar form 



{f{p,(l)) = ^*/(p,9)^- 



The reader is invited to formulate a Hamiltonian description of quantum-classical systems, 
by starting with a symplectic dynamics in which only a part of the position and momen- 
tum variables are complexified into a quantum state vector, and to derive the corresponding 
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Poisson algebra. Now the Lie product is the tensor product of that of the classical subsys- 
tem and that of the quantum subsystem treated as a classical Hamiltonian system. The 
Ehrenfest equation still has the form 

but the right hand side no longer simplifies to the value of a commutator; instead, one gets 
a nonlinear dependence on values. Such nonlinearities are common for reduced descriptions 
coming from a pure quantum theory by coarse graining. Usually, quantum-classical systems 
are regarded as reduced descriptions, and the same phenomenon occurs. There arc plenty of 
other examples of practical importance, the primary one being the Schrodinger-Poisson 
equations in semiconductor modeling. 

13.2.1 Examples. We mention two important examples, molecular quantum chemistry 
and a spinning electron. 

(i) The Born-Oppenheimer approximation of the dynamics of molecules, widely used 
in quantum chemistry, is a typical quantum-classical system of the above kind. The nuclei 
are described by classical phase space variables, while the electrons are described quantum 
mechanically by means of a state vector -0 in a Hilbert space of antisymmetrized electron 
wave functions. 

(ii) A spinning electron, while having no purely classical description, can be modelled 
quantum-classically by classical phase space variables p, q and a quantum 4-component spin. 
Then, with a, /3 as in the Dirac equation, 

H{p,q) = a-p + l3m + eV{q) (13.18) 

is a 4 X 4 matrix parameterized by classical 3- vectors p = pit) and q = q{t), p = pit) is a 
positive semidefinite 4x4 matrix of trace 1, and the trace in the above equation is just the 
trace of a 4 x 4 matrix. 

One gets the equations from Dirac 's equation and Ehrenfest 's theorem by an approximation 
involving coherent states for position and momentum. Note that this is just a toy example. 
More useful field theoretic quantum-classical versions lead to Vlasov equations for (p, q)- 
dependent 4x4 densities, describing a fluid of independent classical electrons of the form 
(13.18). With even more realism, one needs to add also a collision term accoimting for 
interactions, resulting in a quantum Boltzmann equation; and for even more accurate 
modeling, (13.18) is no longer adequate but needs additional dissipative terms. 



The quantum-classical dynamics, given in the Schrodinger picture, can also be written in 
the Heisenberg picture. The equivalent Heisenberg dynamics is 

= dJ{d,H) - dJ{d,H) + ^[H, f] 

where now (•) is the fixed Heisenberg state. From this, one can immediately see that 
every hing depends only on values by applying (•) to this equation: 

iif) = {dJ){d,H) - {d,f){d,H) + {UhJ]). 
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This is now a fully classical equation for classical values of the quantum-classical hybrid 
model considered. 

In the interpretation given in Chapter 7, densities are irreducible objects describing a single 
quantum system, not stochastic entities that make sense only under repetition. (This is 
analogous to the way phase space densities appear in the Boltzmann equation, though the 
analogy is not very deep.) 

In general, values in the quantum-classical dynamics are to be interpreted as objects charac- 
terizing a single quantum system, in the sense of the consistent experiment interpretation, 
and not as the result of averaging over many realizations. 

By design, tn the Hciscnberg picture, the state does not take part in the dynamics. What 
is new, however, compared to pure quantum dynamics is that the Heisenberg state occurs 
explicitly in the differential equation. In practical applications, the Heisenberg state is 
fixed by the experimental setting; hence this state dependence of the dynamics is harmless. 
However, because the dynamics depends on the Heisenberg state, calculating results by 
splitting a density at time t = into a mixture of pure states no longer makes sense. One 
gets different evolutions of the operators in different pure states, and there is no reason why 
their combination should at the end give the correct dynamics of the original density. (And 
indeed, this will usually fail.) This splitting is already artificial in pure quantum mechanics 
since there is no natural way to tell of which pure states a mixed state is composed of. But 
there the splitting happens to be valid and useful as a calculational tool since the dynamics 
in the Heisenberg picture is state independent. 

In contrast to the pure quantum case, there is now a difference between averaging results of 
two experiments pi, P2 and the results of a single experiment p given by (pi -|- P2)/2. That, 
in ordinary quantum theory, the two are indistinguishable in their statistical properties is 
a coincidental consequence of the linearity of the Schrodinger equation, and the resulting 
state independence of the Heisenberg equation; it does no longer hold in effective quantum 
theories where nonlinearities appear due to a reduced description. 



13.3 Deformation quantization 

There are many ways to quantize a classical system. 

An important algebraic approach is Berezin quantization, also called the method of 
orbits. Here classical Poisson representations of Lie algebras are lifted to unitary rep- 
resentations. We only hint at the constructions, and refer for details to Berezin [27], 
Bar-Moshe & Marinov [19], Landsman [152], and Kirillov [135]. The construction 
of the Lie-Poisson algebra in Section 9.5 from a Lie *-algebra L imphes that the dual of L 
becomes in a natural way a Poisson manifold; the corresponding symplectic leaves are the 
so-called co-adjoint orbits, the orbits of the universal covering group corresponding to 
L in its co-adjoint action on L*. The canonical Poisson algebras on the co-adjoint orbits 
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carry an irreducible Poisson representation of L, and any irreducible Poisson repre- 
sentation of L arises in tliis way (up to equivalence). Thus, classifying the co-adjoint orbits 
is the classical analogue of classifying irreducible unitary representations. The quantization 
constructions mentioned above rely on close relations between co-adjoint orbits, coherent 
states over Lie groups, and irreducible unitary representations. These relations can even 
be generalized further, replacing the Lie algebra structure by a purely geometric setting, 
which then leads to the framework of geometric quantization, cf. Woodhouse [263]. 

Another possibility is deformation quantization which deforms a commutative product 
into a so-called Moyal product; for definitions and details, see, e.g., Rieffel [210]. 
Alternatively, deformation quantization may be viewed as a deformation of the quantities 
in a Poisson algebra E obtained by embedding E into the algebra Lin E, identifying / € E 
with the multiplication mapping Mj which maps g to Mfg := fg. This is the procedure 
we shall discuss in more detail. 

Recall the linear operator ad/ e LinE defined by 



writing for emphasis the arguments (in E) of operators from Lin E (often referred to as su- 
peroperators) in curly braces. Note that the quantization preserves nonlinear operations 
(product and Lie product) only up to terms of formal order 0(h). This reflects the ordering 
ambiguity in traditional quantization procedures. For an arbitrary Gibbs state on LinE, 
the expectation 



differs from those of / by a term of numerical order 0{h), justifying an interpretation in 
terms of deformation. 

13.3.1 Proposition. For /, g in a not necessarily commutative Poisson algebra E, 



ad/{5} ■= f ^9- 



(13.19) 



For / e E, we define the quantization / of / by 




(13.20) 



(/) = (/> - y (ad/) 



[adf,g] = [f,adg] = f^g, 
[ad/,adg] = ad/,^. 



(13.21) 



(13.22) 



[Lg] = [/,?] = [f:9] -^f^g- 

[/, ?] = [/, 9]-ifif^9- ^ad/ n g, 



(13.23) 



(13.24) 



Proof. We have, for all /i e E, 



[adf,g]{h} = adf{gh} - gadf{h} 

= f ^ gh- g{f ^ h) = {f ^ g)h, 
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hence [ad/, g] — f ~^ g- Therefore, also [/, adg] — — [adg, f] — —g ~' f — f ~^ g, so that (13.21) 
holds. Similarly, 

[ad/, adg]{/i} = a.df{adg{h}} — a.dg{adf{h}} 
= f^{g^h)-g^{f^h) 
= if^g)^h^adf^g{h} 

by (S5), hence (13.22) holds. (13.23) follows from 

[f:g]= f-Y^df,g = [/,^] - y[ad/,^] = [/,^] - y/n ^, 



and 



Finally, since 



[/,?] = -[?, /] = -[g, f] + ^-jg^f^ [/, g]-^-jg^ /■ 



[f,9] 



■ ih ^ ih ■ 

f - yad/, g - —adg 
ih, , , ih 



= [/> 9] - y [ad/, g] - y [/, ad^] + (^y) [ad/, ad^], 
13.24) follows from (13.21) and (13.22). 



□ 



To actually quantize a classical theory, one may choose a Lie algebra of relevant quantities 
generating the Poisson algebra, quantize its elements by the above rule, express the classical 
action as a suitably ordered polynomial expression in the generators, and use as quantum 
action this expression with all generators replaced by their quantizations. 

In general, the above recipe for phase space quantization gives an approximate Poisson 
isomorphism, up to 0{h) terms. We now show that, however. Lie subalgebras are mapped 
into (perhaps slightly bigger) Lie algebras defining an abclian extension , and that one gets 
a true isomorphism for all embedded Heisenberg Lie algebras and all embedded abelian Lie 
algebras. 

13.3.2 Theorem. (Quantization Theorem) 

If E is commutative then, with 

Mf{g} fg, Qf{g} f{g} ^fg-'-^f^g, 
the quantum Lie product 

B ^ i[A,B] for A, SeLinE 

satisGes, for f,g EE, 



ih 1 

Qf^Qg^ Mf^g- —adf^g = -(M/,g + Qf^g), 



(13.25) 
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Qf^M.^Mf^Q^^^Mf^,. (13.26) 

Any Lie subalgebra L of E defines a Lie algebra 

l^{Mf^g + Qf,\f,g,heI.} (13.27) 

under the quantum Lie product. Ifh is an abelian Lie algebra or a Heisenberg Lie algebra 
then Q is a Lie isomorphism between L and L. 

Proof. Since E is commutative, the first term in (13.24) and (13.23) vanishes, and multiph- 
cation hy l — i/h gives (13.25) and (13.26). The final statement is immediate from (13.25) 
and (13.26). □ 



Note that by the so-called Groenewold-van Hove Theorem (Groenewold [101], van 

Hove [250], Gotay et al. [96]), no quantization procedure can exist which possesses all 
features desirable from a naive point of view. The present quantization procedure sacrifices 
the exact preservbation of commutation rules. 



13.4 The Wigner transform 



We now specialize the preceding to the standard symplectic Poisson algebra E = C°°(R" x 
R"). Thus, E is a commutative Poisson algebra of phase space functions as discussed in 
Section 12.2. in this very important case, which covers A^-particle quantum mechanics, the 
embedding discussed in Section 13.3 turns out to be equivalent to standard quantization. 
The equivalence is given in terms of the Wigner transform whose properties we derive now. 

In the present special case, phase space quantization amounts to using the reducible repre- 
sentation 

P = P - -jdq, q = q+ —dp 

of the canonical commutation rules on phase space functions instead of the traditional 
irreducible position representation by 



p — —ihdx, q = X 



on functions of configuration space, or of the irreducible momentum representation by 

p' = Pi 4 = ihdp 

on functions of momentum space. Since the momentum representation is obtained from the 
position representation by the simple canonical transformation which interchanges x and p 
and then writing g' = — p, pi = q, it is enough to discuss in the following the transformation 
to the position representation. 
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By quantizing in phase space, one gives up irreducibility (and hence the description of a 
state by a unique density) but gains in simphcity. This may be compared to the situation 
in gauge theory, where the description by gauge potentials introduces some arbitrariness 
with which one pays for the more elegant formulation of the field equations but which does 
not affect the observable consequences. 

We now show that these representations are related by a Wigner transform (cf. WiGNER 
[262]). 

We consider the quantization of the commutative Poisson algebra E = C°°(R" x R") with 
standard Poisson bracket (12.17). Since for / = /(p, q) we have 

P^f = dJ, q-^f = -dj, (13.28) 

the quantization rule amounts to 

^ ih ^ ^ ih ^ 

p = p-—dq, q = q + —dp. (13.29) 

By (13.25), 

P/i ^qu = p,i^qu = S^i,. (13.30) 

Thus we have a unitary representation of the standard Heisenberg algebra on phase space. 
To relate this representation to the traditional position representation given by 

p——ihdx, q — x, (13.31) 
we introduce the Wigner transform 

f(x,y) I dpe^^^^^-y^f(p,^^) (13.32) 

of a function / e C°^(M" x M"). 

13.4.1 Theorem. The Wigner transform has the inverse transform 

f{p, q) = h-- j d^e-'^P"^f{q + ^,q-0, (13-33) 

where 

h — 2'Kh, n = dimp = dimg, (13.34) 

and satisfies the rules ^ _ _ _ 

Pf-Pf, qf = qf- (13.35) 

Proof. We have 

I d^e-'^^^q{q + e, ? - = / die-'^^^^ j dke'^'^^f(k, q) 

= ldp[Jdie'<'-^^"^)f{k,q) 
= jdph^5(k-p)f(k,q)^h"f(p,q), 
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proving (13.33). (13.35) follows from 

pf{x,y) = -^n^, j dpe^^^^^-y^fi^p,"^) 

= -in I dpe^^^^^-y^[cp + ^d,)f{p,^) 
= Jdpe^^''(^-y\p-p,)f{p,^y) 

= J dpe^^^^^-y^pf {p, ^) = ff{x, y) 

and 

m^,y) = ydpe-'(-..(£±l + |9p)/(p.£±l) 



Thus the Wigner transform provides an isomorphisen between the two representations. Note 
that the phase space representation is highly redundant since the position representation 
does not act at all on the ^/-coordinate. The redundancy is apparent from the fact that the 
algebra generated by p and q is much smaller than Lin E, and in fact isomorphic (modulo 
convergence issues) to LinC°°(]R") via the Wigner transform. However, this redundancy 
is very helpful since it makes the classical limit and the approximation by semiclassical 
techniques much simpler. 

The Wigner transform can be applied to all nonhnear PDEs of Schrodinger or Dirac type. 
These have the form 

7(V'V'*)V' = 0, V' e C°°(R"), (13.36) 
where I is an operator- valued function of the density matrix 



p := 11)11)* (13.37) 
and can be rewritten in terms of it as the equation 



7(p)p = 

which after an inverse Wigner transform becomes an equation 

I{p)p = (13.38) 

in phase space. However, (13.37) loses the rank 1 condition implicit in (13.36), hence 
corresponds to a "mixing" of pure states. 
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13.4.2 Proposition. The bilinear inner product 

if\9) = j dpdqf{p,q)g{p,q) (13.39) 



satisfies 



if\9) = J dxdy f{x,y)g{y,x). (13.40) 



2 



Proof. This follows from 

J dxdyf{x,y)g{y,x) = J dxdy J dpe^^^^^'^^ f (jk""-^) j dke^^^^^-'^ gi^k^-^ 

"dpdkdxdye^^-'^^(^-y^f{p, ^^)9{k, ^) 
dpdkdqdwe'^^-^^'''"f{p, q)g{k, q) 
dpdkdq(^ 6{p - k)f{p, q)g{k, q) 

dpdq f{p,q)g{p,q) = (/|^). 



□ 



Under conjugation, we have directly from (13.32), 



nx,y) = f{y,x) ^ np,q) = f{-p,q), (13.41) 
indicating that the complex combination 

z — q + ip 

behaves naturally. 

13.4.3 Proposition. The conditional expectation of an operator A e 5'(Q,L) at fixed 
X or k, respectively, defined by 



{A)^:= J dkp{k,x)A, (A)^ := J dxp{k,x)A, (13.42) 

satisfies 

{A{p))k = A{k){l)k, (Aid)), = A{x){l), (13.43) 

and 

(A) -.^ J dkdxAp{k,x) ^ j dk{A)^^ J dx{A)k. (13.44) 
Thus p is the momentum operator and q tiie position operator. 

Proof. The proof is straightforward. □ 
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13.4.4 Theorem. With the pointwise convolution 

{f*9){p,q)^ J dkf{k,q)g{p-k,q), (13.45) 
we have for the pointwise product 

fg = J^g. (13.46) 

Proof. By (13.33), we have /gi = e, where 

= J dCe-'^P^^I dke'^'"^f{k,q) J dle'^'^^g{l,q) 



dkdl(^ I d^e'<'^+'-P^^^)f{k,q)g{l,q) 
dkdl N'Sik + I - p)f{k, q)g{l, q) 
= h"J dkf{k,q)g{p-k,q) = h''{f*g){p,q). 



□ 



A comparison with (13.35) shows that, formally, 

P*f^Pf, (l*f = Qf- 
For a fully locahzed phase space function, (13.32) imphes directly 

f{p,q)-f{q) ^ nx,y)^h^f{x)S{x-y), (13.47) 
corresponding to Lagrangian terms ^{x)'^ f{x)^{x). 

The symbol formulation. To extend the quantization rule to arbitrary smooth functions 
we need the symbol formulation of p and q. 

Let Q — X R"* the classical phase space, and let 

j = in/2. (13.48) 
13.4.5 Proposition. If p e -5'(Q,Eo) then 

p(...) = /«. + p.. + ,)e-/^.p.,. (13.49) 

where 

p{p, q) = {nh)-^"" J p{k+p,x + q)e^-'''^dkdx. (13.50) 
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Proof. We use the formula 

J f{q)e-^-'^'Hpdq = {nhrf{0), (13.51) 

and find for p defined by (13.50): 

J p{k+p,x + q)e-P 'i/^dpdq 

= (Tr/i)-^™ j p{k' + k + p,x' + X + q)e^'-'''-P-'i^l^dk'dx'dpdq 

= (7r^)-2"' j p{k +p',x + q')eP' '''/^e-P 'i'/^e-P' ''/^dpdq'dqdp' 

= (tt/i)-™ j p{k, X + q')e-P-'''/^dpdq' 
= p{k,x). 

We used the substitution k' — p' — p, x' — q' — q. □ 

13.4.6 Proposition. For A e 5'(Q,Eo), the symbol A{p, q) defined by 

A{p, q)p{k, x) := A{k+p,x- q)p{k +p,x + q)e-P '^/^dpdq (13.52) 
Jn 

is consistent witli tlie interpretation 

p^k + jda,, q = X - jdk (13.53) 

when A is a normally ordered polynomial acting on p{k, x) where all are to the right of 
all q^. Moreover, we have the canonical commutation rules (CCR) 

i-[Pn,<lu] = (13.54) 

Proof. We can rewrite (13.49) as 

p{k,x) = J p{k',x')e-^'''-''>^'''-''^/^dk'dx', 
and find inductively for a monomial q"p^ (with multiexponents) 

q"p^p(A;, = j {kTi^x - x' fp{k', x')e-^^'-^>^'''-''^'^ dk'dx' 



I 



{k + pY{x - qYp{k + p, X + q)e~P'^^^dpdq. 



Now take linear combinations with coefficients from Eq to find (13.52). 
The CCR follow since (with /„ = df/du) 

P^flvf = Pi,{x^f - jfk,) = k^{xj - jfkj + j{x^f - jfk,)x. 

— k^Xnf + j{Sfj_i,f + Xi^fx^ — k^fk^) — j'^fk^Xfj^, 
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— k^Xnf + j{ — 5n^f + Xufxi^ — kfxfkv) ~ P fkv^fx-i 

hence 

This also imphes (13.53). □ 

Write Jm(/) := i{f - /*), so that 

Jm(a + jh) := b if a, 6 e Eq are Hermitian. 
13.4.7 Proposition. If 



p(—k,x) = p{k,x) for all {k,x) G ^2 (13.55) 

then the dynamics 

p={H,p}, (13.56) 

where 

{H, p} = 1 {Hp - {Hp)*) ^ Jm{Hp) (13.57) 

leaves the total density (1) invariant. Moreover, if for all A we have {A* A) > at time 
t = then {A* A) >0 at all times t > 0. 

13.4.8 Proposition. (Classical limit) 

In the limit j 0, 

A{p,ci) = A{k,x) + 0{h), 

and 

{H{p, q), p}{p, q) = d,Hd,p - d,Hd,p + 0{h) 
reduces to the classical Poisson bracket. 

Note that 

{P,P} = Pq, {q,p} = -Pp 

so this is consistent with 

p = {qdq + pdp)p = HpPg - HgPp 

from Hamiltonian dynamics. 

To get the density in a position representation, write 
where 



p{p,x') = {2nh)-"' J p{x,y)e-P-^''^dx 
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For normal ordered H, we have 

H{p,q)p{x,y) = (27r^)— y H{p,x)p{p,y)e^--''Hp. 

Similarly, 

p{k, x) = {nh)-'^ J p{k +p,k- p)e-P'''^dx 
defines a CCR isomorphism with the momentum representation. 
Note that in the position and momentum representation, 

x') = p{x',x), p*{k,k') = p{k',k) 
while in the phase repesentation, 

p*{k,x) = p{-k,x). 
This becomes natural in an analytic representation z — x + ik. 

For numerical calculation, approximate H G iS'(r2,Eo) by trigonometric functions (complex 
exponentials). This gives finite difference formulas. The initial density p can be just given, 
or it can be regularized (using a Husimi function? GMMP 1.24) 



Part V 

Representations and spectroscopy 
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Chapter 14 

Harmonic oscillators and coherent 
states 

Part IV applies the concepts introduced so far to the study of the dominant kinds of elemen- 
tary motion in a bound system, vibrations (described by oscillators, Poisson representations 
of the Heisenberg group), rotations (described by a spinning top, Poisson representations 
of the rotation group), and their interaction. On the quantum level, quantum oscillators 
are always bosonic systems, while spinning systems may be bosonic or fcrmionic depending 
on whether or not the spin is integral. The analysis of experimental spectra, concentrating 
on the mathematical contents of the subject, concludes our discussion. 

This chapter is a detailed study of harmonic oscillators (bosons, elementary vibrations), 
both from the classical and the quantum point of view. We introduce raising and lowering 
operators in the symplectic Poisson algebra, and show that the classical case is the limit 
/i — > of the quantum harmonic oscillator. 

The representation theory of the single-mode Heisenberg algebra is particularly simple since 
by the Stone-von Neumann theorem, all unitary representations are equivalent. We find 
that the quantum spectrum of a harmonic oscillator is discrete and consists of the classi- 
cal frequency (multiplied by h) and its nonnegative integral multiples (overtones, excited 
states) . 

To make work in the representation where the harmonic oscillator Hamiltonian is diagonal, 
we introduce Dirac's bra-ket notation, and deduce the basic properties of the bosonic Fock 
spaces, first for a single harmonic oscillator and then for a system of finitely many harmonic 
modes. 

We then introduce coherent states, an overcomplete basis representation in which not only 
the Heisenberg algebra, but the action of the Heisenberg group is exphcitly visible. Coherent 
states are quantum states that behave as classically as possible, thereby making a bridge 
between the quantum system and classical systems. The coherent state representation is 
particularly relevant for the study of quantum optics, but we only indicate its connection 
to the modes of the electromagnetic field. 
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14.1 The classical harmonic oscillator 



The classical onc-dimcnsional harmonic oscillator without damping, introduced in Section 
2.2, is defined by the Hamiltonian 

/f = |^ + V(,). (14.1) 

where V{q) is quadratic and bounded from below, so that there are constants Qq, Vq and 
A; > with 

The number k is called the stiffness; the greater the constant k, the more difficult is it to 
move away from equilibrium. The Hamilton equations are: 

? = — , P = -V(q) = -k(q - qo) . 
m 

A complex exponential ansatz shows that the solution of the Hamilton equations is: 

q{t) = qo + 2 Re(e''^*a;) , p{t) = Reiiume'"^^ x) , 
where x is a complex number depending on the initial conditions, and 

T 



m 



is the frequency of the harmonic oscillator. It is convenient to express the variables in 
terms of a so-called complex normal mode, the function a{t) defined by 

a{t) := \[^m - %) + ^- ^^^^ 



One can recover q and p through 

= go + + ' V{t) = ^V^{a{t) - a*{t)) , (14.2) 

hence the description by a normal mode is equivalent to the original description. Differen- 
tiating a{t) and using q — •pjm^ we obtain 

We conclude that ait) has to obey 

a(t) = a(0)e^'"* . 

We calculate the Lie product of a and a* and find 

da da* da* da 



a "1 a 



dp dq dp dq 
V 2a; ^2^ 
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that is, we obtain the relation 

a^a*^i. (14.3) 

The relation (14.3) is called the canonical commutation relation (CCR) for the har- 
monic oscillator. More generally, one finds for the Lie product of general functions /, g of 
a and a* the formula 

.Of dg . Of dg 

This will be seen later as a special case of a general principle for constructing so-called 
Lie-Poisson algebras from a Lie algebra. 



14.2 Quantizing the harmonic oscillator 

For a classical harmonic oscillator, the Lie product in the CCR (14.3) is defined via the 
Poisson bracket. To quantize the harmonic oscillator, all we do is replace the Lie product 
in the CCR by its quantum analogue. Thus we postulate the existence of an operator a 
and its conjugate a* with the relation 

^[a,a*] =i, 

equivalently 

[a,a*]^n. (14.5) 

Note that equation (14.5) has the right behavior under /i — > 0, since in the limit that h goes 
to zero, we have to end up in the classical regime, where the operators a and a* become 
functions on phase space and hence commute. 

Equation (14.5) defines a *-algebra, i.e., an associative algebra with unity and an involution 
*, generated by a with the relation aa* — a*a — h. Later we look for representations in 
a Hilbert space, where the involution then corresponds to Hermitian conjugation. But 
already at this level, we call expressions in a and a* operators. 

The quantum mechanical Hamiltonian for the harmonic oscillator is the operator given by 
direct substitution of the p and q from (14.2): 




= |a;(aa* -|- a*a) + Vq 

= u;a*a + ^u;h + Vo. (14.6) 

Since only differences in energy are important, one often chooses Vq = —^huj to get the 
simple formula H — uja*a. 

In the classical theory we have commuting variables a and a* with a Lie product a "i a* = i. 
That is, we have a commutative *-Poisson algebra. In the quantum theory we have an 
associative algebra generated by a and a* with the relation aa* — a*a — h. Since in the 
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quantum theory two seemingly different polynomial expressions (such as aa* and a*a + h) 
can be the same, there is a need for a preferred ordering of a and a* in monomials. The 
normal ordering is that ordering of a and a* in monomials where all a*'s are moved to 
the left of the a's. It is easy to see that every noncommutative polynomial in a and a* 
can be normally ordered by repeated use of the relation aa* — a*a + h; in the process of 
normal ordering, lower degree monomials are generated with higher powers of h. We give 
the following proposition that guarantees that taking ^ ^ we recover the classical theory: 

14.2.1 Proposition. Let f and g be noncommutative polynomials in a, a* and h. Viewing 
f and g as polynomials in commuting variables a and a*, one can calculate f ^ g using (14.4). 
As noncommutative polynomials one can calculate the commutator [/, g] — fg — gf ■ The 
two results are related by: 



One expresses this relation by saying that the quantum Lie product is a deformation of 
the classical Lie product. 

Proof. The order of the a and a* does not matter since changing the order we generate 
powers of h. We use induction on the degree of the polynomials. For degree zero and one, 
(14.7) holds. Suppose it holds for degree of / smaller than n and degree of g one. If we 
write f — a* S + Ta for some normally ordered polynomials S and T with degrees smaller 
than n, we see that for g — a*: 



and the result holds for / arbitrary and g = a*. For g = a it goes similar. Suppose the 
claim holds for all g with degree k, with < k < n. Then for degree n + 1 let us write 
g — aP + a*Q + R, where P, Q and R are polynomials of degree strictly less than n + 1. 
Then we have: 



(14.7) 



i[a*S + Ta, a*] 



ia*[S, a*]+ifiT + i[T, a*]a 

ha*S n a* + ihT + hTn a* a + 0{h^) 

h{a*S) n a* + h{Ta) n a* + 0{h^) , 




Uf, aP + a*Q + R] 



^[f, a]P + U[f, P] + kf, a*]Q + UV, Q] + kf, R] 



fnaP + af^P + fna*Q + a*f^Q + fnR + 0{fi) 
fn{aP) + fnia*Q) + fnR + 0{h) 
f ^ {aP + a*Q + R) + 0{h) . 



(14.8) 



And the proof is complete. 



n 



Extension to the anharmonic case. The anharmonic oscillator can in principle be 
treated in a similar fashion. Since the classical Lie product (the Poisson bracket) is the 
same, we may proceed exactly as before, except that the formulas involving the Hamiltonian 



14.2. QUANTIZING THE HARMONIC OSCILLATOR 



275 



are no longer valid. In particular, since the frequency 00 was determined by the Hamiltonian, 
it is now an arbitrary constant. Thus there arc multiple, inequivalent ways of defining the 
quantities a{t). Indeed, there is even more freedom since the only important property to 
be preserved is the canonical commutation relation. 

Generalizing the affine form of a{t) in the harmonic case, we choose it as an arbitrary affine 
combination of q{t) and p{t), 

X + fiq + iup (14.9) 

for suitable complex numbers /i, v and A. As can be easily verified, the canonical commuta- 
tion relations (14.3) are reproduced, so that the classical Lie product takes again the form 
(14.4), exactly when the restriction 

2Re/xI7= 1 

holds. Having made a choice, we obtain a classical Hamiltonian H = H{a, a*) in terms of 
a and a*. Using the Heisenberg dynamics and (14.4), we obtain 

dH 

a{t)^Hna{t)^-i—. (14.10) 

We remark that if (14.3) holds for t = then it holds for all t. Indeed, the derivative of the 
left-hand side of (14.3) vanishes identically. 

Using a different choice of the parameters defining a we get a different variable a', which is 
affinely related to the original a, 

a' ^ a + pa + '^a* . (14-11) 
The requirement that a' satisfies the same commutation relations as a leads to the restriction 

W-\l? = l. (14.12) 

A transformation of the form (14.11) satisfying (14.12) is called a Bogoliubov transfor- 
mation. Bogoliubov transformations have important applications; for example, they were 
at the heart of Hawkings' proof that black holes radiate. The generahzation of Bogoliubov 
transformations to systems of oscillating electron pairs in metals is an important ingredient 
for the theory of Cooper pairs, which explains superconductivity effects in metals at low 
temperature. 

Different choices of the coefficients in the definition of a lead of course to different forms of 
H{a,a*); this means that different Hamiltonians H{a,a*) can describe the same oscillator. 
The particular choice above for the harmonic oscillator is the one leading to H{a, a*) — 
Eq +u;a*a, for which the dynamics (14.10) takes the simple form a — —iua. In theoretical 
physics there are different operators and labeled by some parameter k. One tries to 
find by means of Bogoliubov transformations the simplest form of the Hamiltonian. The 
preferred form is the form where H is diagonalized: H — ^j^. aj^afe -|- . . ., where the dots 
contain terms of higher order in the operators a*; and a\. 

The quantization of an anharmonic oscillator is done as in the harmonic case. For each 
classical Hamiltonian polynomial in a and a*, there is a unique normally ordered quantum 



276 



CHAPTER 14. HARMONIC OSCILLATORS AND COHERENT STATES 



version. However, when modeling the same system both in a classical and in a quantum 
setting, the coefficients of the quantum system in a normal ordering of the operators must be 
taken to depend on h, and the form of this dependence is not determined by the quantum- 
classical correspondence. Therefore, the best fit of coefficients of H to experimental data 
will generally produce different optimal values in the classical and the quantum case. In a 
quantum field theory, the coefficients will also be dependent on the scale at which frequencies 
remain unresolved, giving so-called running coupling constants which play an important 
role in renormalization techniques. 



14.3 Representations of the Heisenberg algebra 

We saw at the end of Section 14.1 that the Heisenberg algebra t(3, C) can be considered as 
being generated by 1, a, and a* where a and a* satisfy the CCR a ~i a* = i. 

In the classical case, we know a realization of these commutation relations in terms of a 
Poisson bracket. In the quantum case, we must find a representation in terms of operators 
in a Hilbert space. The representations of physical interest are the unitary representations, 
which represent the one as identity and behave properly under the *-operation. In this 
section we construct a unitary representation of the Heisenberg algebra. 

In the quantized version of a classical theory the functions on phase space become elements 
of some associative algebra E. For a representation we want to reahze the algebra E as a 
subalgebra of an algebra of linear operators. 

The approach of Schrodinger (1926) to this problem was to take as Hilbert space the space 
of square integrable complex- valued functions on M^; then the Schrodinger equation for 
the dynamics of a pure state takes the form of a wave equation, which was familiar to 
physicists at that time and hence came to dominate quantum mechanics. The approach 
taken by Schrodinger proved to be very successful and is also presented in many quantum 
physics textbooks. 

For a single particle, Schrodinger's representation is quite intuitive: the real- valued function 
is the probability density for the presence of a particle at a point x in space. For 
multiparticle systems, the intuitive advantages of Schrodinger's representation is no longer 
given, as the wave functions are no longer in physical space but, for n particles, in an 
abstract 3n-dimensional configuration space. For systems involving an unconserved number 
of particles, in particular for interactions with light, and for systems in the thermodynamic 
limit, things are even more complicated since the configuration space becomes infinite- 
dimensional, and the wave function representation becomes unwieldy. Nevertheless, there 
are interesting papers using the resulting functional Schrodinger equation to illuminate the 
relations between classical solitons and quantum bound states (see, e.g., Jackiw [120]). 

One year earlier than Schrodinger, Heisenberg invented his infinite-dimensional matrix al- 
gebra. We present Heisenberg's approach since it generalizes easily to the most complex 
quantum systems, including the universe as a whole. 
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We now look at an arbitrary unitary representation J : L — > LinH in a Euclidean space H 
satisfying 

J(a*) = J(a)*, J(l) = 1. 

We shall write the operators corresponding to a and a* in the representation again by a and 
a* (rather than using J(a), etc.), in order to avoid clumsy notation. This will not cause 
problems since the representation turns out to be faithful. Then the operator 

1 * 
n :— —a a, 

n 

for reasons that will soon be apparent, is called the number operator, satisfies the com- 
mutation relations 

[a, n] — a , [a*, n] — —a* , 

as is easily checked. This implies that the vector space generated by 1, a, a* and n is closed 
under the commutator, and hence forms a Lie *-algebra L with the quantum Lie product, 
called the oscillator algebra os(l). In this section (as always when classifying unitary 
representations), it will be more convenient to work directly with commutators. 



We now illustrate an important technique in representation theory, which in many cases of 
interest provides all irreducible representations of a certain kind. See Section 16.4 for some 
other applications. 

We define the Verma module corresponding to a complex number A by 

Vx^ {i^ eW\ni/j = Xi/j} , 

where the Hilbert space H is the closure of H. Since V\ is a vector space closed under 
multiplication by operators from Lq, the space Vx is an Lq- module. 

If Vx 7^ 0, i.e., if it contains a nonzero vector, then A is an eigenvalue of n, and any nonzero 
ip & Vx is a corresponding eigenvector. Thus the nonzero Verma modules are just the 
eigenspaces of the eigenvalues of n. Since we consider here only unitary representations 
where * is the adjoint, this implies that 

is real and nonnegative. Noting that in general n is Hermitian, we now make the slightly 
stronger assumption that n is self- adjoint as a densely defined operator of the Hilbert space 
H. Then the spectral theorem implies that the infimum 

- . . tlj*n'ijj 
A = mt — — - 

is a real and nonnegative number, attained for some ip ^ 0, and ^0 is an eigenvector of n 
corresponding to the eigenvalue A. Thus V-^ ^ 0. Now consider an arbitrary A with Vx ^ 
and a nonzero ip G Vx- Then naip = (A — l)aip, hence aip G Vx-i. If aV' = then 

A'0*'0 = ip*nip — —ip*a*aip — 0, 
n 
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hence A = 0; and if aip ^ then Vx-i ^ 0, and A — 1 is an eigenvalue of n. In the latter 
case, we can repeat the step once, or more often. But since all eigenvalues are nonnegative, 
this can happen only a finite number of times, and ultimately we must end up with the 
other alternative. Hence zero is an eigenvalue (in particular A = 0) and A — n = for some 
nonnegative integer n. Thus the only possible eigenvalues are nonnegative integers. That 
all these actually are eigenvalues follows by a similar argument. Indeed, with as before, 
we have 

na*"^ = ([n, a*] + a*n)ip = (n + l)a*ip = (A + l)a*ip, 
hence a*ip G Va+i- Since 

||a>||2 = ^*aa> = ij*{h + a*a)ij = h\\i;\\^ + \\ai/;\\^ > > 0, 

Va+1 7^ and a*ip is an eigenvector for the eigenvalue A + 1. By induction, we reach all 
positive integers from V^ — Vq. Thus we have proved the following theorem: 

14.3.1 Theorem. In a representation in which n is self-adjoint, a Verma module Vx of the 
oscillator algebra L is nonzero if and only if A is a nonnegative integer. 

In particular, the spectrum of the Hamiltonian H — uja*a of a quantum harmonic oscillator 
consists of the nonnegative integral multiples of uU. 

The results obtained justify the following terminology. The operator a is called a low- 
ering operator, since its application to an cigenstate of the number operator n lowers 
the associated eigenvalue by one. The operator a* is called a raising operator, since its 
application to an eigenstate of the number operator n raises the associated eigenvalue by 
one. Together, the operators a and a* are called ladder operators. A unit vector in the 
Verma module Vq is called a ground state (in the second quantized language of quantum 
field theory a vacuum vector). For a ground state, i.e., a nonzero vector ip G Vq, wc 
have nip = 0, hence = ip*a*ailj = h'ip*ml) = and therefore aip = 0. Thus the ground 

state is annihilated by the lowering operator. Therefore a is also called an annihilation 
operator; if this term is used then a* is called a creation operator. 



14.4 Bras and Kets 

In his groundbreaking work on quantum mechanics, Dirac introduced a notation for vectors 
and operators that is widely used by physicists but is quite different from what mathemati- 
cians are used to. Dirac's bra-ket calculus is not very well defined in the way actually used 
by physicists, since the basis vectors considered in the calculus do not necessarily lie in the 
Hilbert space in which everything should happen from a strictly axiomatic point of view. 

We define here a precise version of Dirac's bra-ket calculus, which can also satisfy mathe- 
maticians. Instead of working in a Hilbert space we consider a fixed dense subspace which 
we denote by H. Thus EI is a vector space with a Hermitian inner product (•!•), antilinear in 
the first argument and linear in the second, such that {ip\ip) is always real and nonnegative, 
and the relation 

(01^)* = (^10) (14.13) 
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holds, where a* denotes the complex conjugate of a number a E C The inner product 
defines a Euclidean norm WtpW := ^yJ^p\^p), and the Hilbcrt space is the closure EI of EI in 
the topology induced by this norm. We refer to the elements of EI as smooth vectors since 
they correspond in the important special case H = C°° (M) to arbitrarily often differentiable 
functions. 

Every smooth vector defines a continuous linear functional, denoted by which maps 
e H to the complex number 

r{<l>) := im- (14-14) 

Dirac's idea was to turn this formula into a more suggestive form by splitting the bracket 
(■0, 0) into a bra standing for ip*, and a ket standing for 0, and deleting the now 
superfluous parentheses. Then the formula becomes 

(V'||0):=(V'|0), (14.15) 
which just asks us to replace two adjacent vertical bars by a single one. 

If EI is itself a Hilbert space (and in particular, if the dimension of EI is finite) then it is not 
difficult to see that all continuous linear functionals arise in this way. However, in many 
interesting infinite-dimensional vector spaces H, the situation is different. For example, if 
H — C°°(R) and z eM. then the mapping Sz which maps tp EMto 

s.ii') ■■= Hz) 

is a continuous linear functional which cannot be obtained as ip* for some smooth vector ip. 

We can accommodate this in the bra-ket calculus by allowing as bras all continuous linear 
functionals rather than only those which have the form ip* with -0 G EI. We simply need to 
label the continuous linear functional as bras with symbols ip from a set EI* such that the 
functionals of the form ip* with tp eM get the label ip. The set EI* can be made canonically 
into a vector space containing H as a subspace by requiring the mapping * : ip ^ ip* :— {ip\ 
to be antilinear. Then EI* = EI in case H is a Hilbert space, but in general EI may be a 
proper subspace of EI*. Since the inner product extends continuously from EI to the Hilbert 
space completion EI, every element of EI defines a continuous linear function. Thus, in 
general, the Hilbert space EI sits somewhere in between EI and EI*, 

HCHCH*. (14.16) 

Frequently, some extra "nuclear" structure on EI is assumed which turns (14.16) into a 
so-called Gelfand triple or rigged Hilbert space (see, e.g., Maurin [170], BoHM & 
Gadella [34], and for applications to resonances KUKULIN et al. [147]); however, on the 
level of our discussion, we don't need this extra structure. 

If EI = C°°(IR), physicists call the vectors -0 G H* wave functions - well being aware 
that they are not always functions in the standard sense -, and write them with a dummy 
argument x as ip{x). For example, they consider to be a shifted delta function, and 
write it as 5{x — z). 

A wave function which is in the Hilbert space EI C EI* is called normalizable, the remaining 
wave functions are called non-normaUzable. In mathematical terms, the normalizable 
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wave functions are equivalence classes of square integrable functions, with two functions 
being regarded as equivalent when they differ only on a set of measure zero. The shifted 
delta functions are examples of non-normalizable wave functions. 

For a general Euclidean space H, we refer to the elements of H* as rough vectors since 
they correspond in the special case EI = C°°(]R) to functions that are less smooth, possibly 
not even continuous, and possibly (as in case of the 6^) not functions at all. 

Having extended the bra-kct notation to allow rough vectors as labels in bras, the symmetry 
property (14.13) is lost. To restore that, we simply extend the inner product to enforce 
the validity of (14.13) by defining := if e H* and ijj e M.. This can be 

done consistently, and implies that now kets can be labeled by rough vectors, too. But now 
the formula (14.15) makes trouble. What is ("010) when both and t/j are rough vectors? 
In general, there is no solution; this product cannot be always defined. However, one can 
consistently define it in certain cases, namely when is in some subspace EI of EI* and 
the linear functional ip* defined at first only on EI can be extended to EI by some limiting 
procedure. We won't list here the various possibilities; our usage of bras and kets will be 
restricted to cases where at least one of the two labels in an inner product is smooth. 

The main use of Dirac's notation is for the specification of vectors and matrices in a par- 
ticular representation of the algebra of quantities. We first review the notation in the case 
where a countable orthonormal basis of smooth states is available. In this case there is a 
countable set K of labels such that the basis consists of the kets \k) with k e K, and 
orthogonality implies that 



In the finite- dimensional case, there is a close correspondence to the notation of linear 
algebra if we take {k\ to be the kth unit row vector with a 1 in position k and zeros 
elsewhere, and \k) to be its transpose, the kth unit column vector. 



{j\k) = Sjk, 



and the resolution of unity 



J2\k){k\ = l. 



k 



X — 




represents the vector x = {xk), 



k 



and 



{k\x = Xk 



gives the components of x. 




represents the matrix A — (Ajk), 



jk 





represents Aj:, the jth row of A, 



k 



A\k) 




represents A:k, the kth column of A, 



3 
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and 



{j\A\k) = A,, 



gives the matrix entries of A. Compared to the standard hnear algebra notation there is 



The situation is different when X is a structured set, for example a set of pairs {k, s) 
where /c is a momentum label and s a spin label, or other such sets arising naturally in the 
dynamical symmetry approach of Section 17.6. Then the index notation becomes somewhat 
cumbersome to comprehend, and the more lengthy bra-ket notation is superior. 



As we have seen in Section 14.3, every nice unitary representation of the oscillator algebra 
contains a ground state ip of norm 1, and hence the representation contains the vectors 
{a*)^il) {k = 0, 1, 2, . . .). Their span defines a Euclidean vector space F+ whose closure F+ 
is a Hilbert space, called the single mode bosonic Fock space, or simply Fock space. 
Clearly, F_|_ is closed under the action of L, hence we have a unitary representation of L 
on F_|_. It is not difficult to see that different choices of the ground state either define the 
same Fock space (if the ground states differ only by a phase) or orthogonal Fock spaces. 
Indeed, if F C F+ is an invariant submodule, it needs to have a vector ipo, which necessarily 
coincides with the ground state of F_)_ up to a complex number. Thus an arbitrary unitary 
representation is a direct sum of Fock spaces. Thus the representations on a Fock space 
are irreducible representations. We shall show in a moment that the unitary representation 
on a Fock space is essentially unique. This is the content of the celebrated Stone— Von 
Neumann theorem, which actually is about the representation of the Heisenberg group. 

Bosonic Fock spaces with more degrees of freedom arc obtained by taking tensor products 
of the Fock space with one degree of freedom, and describe systems of quantum oscillators. 
As we shall see in Chapter 15, there is also a fermionic counterpart of Fock spaces, which are 
related to so-called Clifford algebras. The single mode case describes a so-called qubit and 
is simply the vector space C^; the general case is a tensor product of these, and describes 
systems of qubits. 

We now study the structure of F+ for a given ground state of norm 1 in more detail. 
The properties found will lead to a construction of a Hilbert space which actually contains 
a representation of the Heisenberg algebra (which, so far, we simply had assumed). 

14.5.1 Proposition. The vectors 



no gam. 



14.5 Boson Fock space 




(14.17) 



satisfy the relations 



a*\k-l) ^k\k), a\k) 



h\k- 



1), n\k)^k\k), 



{k\k') 
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Proof. The first relation is jiist definition. For the second observe that aip = and 



a, (a 



*)'^] = k{a*Y ^- For the third, just combine n = ^o,*a and the first and second 
relation. For the fourth relation we have {k\ = -g{ilj)*a^ and {k\k') = if k ^ k' since 
eigenvectors of a Hermitian operator corresponding to different eigenvalues are orthogonal. 
So only the normalization needs to be checked: 

{k\k) = ^{k - l\aa*\k -1) ^ ^(k-l\h + hnlk - 1) = ^{k - Ilk - 1) . 
fc"' k^ k 

Using induction the fourth equality follows. □ 
In the Fock space F_|_, the vectors are by definition the linear combinations 

oo 
fc=0 

Proposition 14.5.1 gives us the relations 

{aijj)k = hipk+i , {a*ip)k = kipk-i , {nip)k = kipk , (14.18) 
and ip*ilj = {T.^k\k))*Y.i^i\l) = T.$^k^k, hence 



k\' 

t 
k\' 



^*i' = Y^T\^kiJk. (14.19) 



Equations (14.18) and (14.19) are an equivalent description of the equations of Proposition 
14.5.1. 

We now define H as the closure of F+. Then the operators a, a* and n are defined on a 
dense subset. Previously we have seen that if the canonical commutation relations admit 
an irreducible representation, then it has to be of the form as described by Proposition 
14.5.1. But now we can say more: 

The set H of vectors ^0 e C°° with finite norm 



oo ^k 



k\ 

k=0 



is a Hilbert space with inner product (14.19), on which the definitions 14.18 give densely 
defined operators a,a*,n. The components of a*'0 grow significantly faster than those of 
ijj. so that a*%l) G EI only for ip m a, proper subspace of H. This subspace is dense, since 
it contains the dense subset of ip with only finitely many nonzero entries. Note that the 
operators 1, a, a* and n, and hence all elements of L are represented by infinite tri diagonal 
matrices. This is the representation of the quantum harmonic oscillator discovered by 
Heisenberg in his groundbreaking paper [110]. 



It is now easy to check that the operators 1, a, a* and n satisfy the canonical commutation 
relations, that a* is the Hermitian conjugate of a, and that n — a*a. Thus we have a 
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representation of L. The representation is irreducible since acting repeatedly with a* G L 
on the vector ip with entries ■?/'fc = ^^fco (the ground state) gives a basis of H. Combining 
this with the uniqueness statement obtained before, we arrive at the following theorem of 
Stone and Von Neumann (but essentially already obtained in [110]): 

14.5.2 Theorem. The canonical commutation relations admit an (up to equivalence) 
unique irreducible unitary representation on a Hilbert space such that the action of a, 
a* and n = a*a is defined on a dense subspace and n is self-adjoint. 

The theorem holds with a similar proof for arbitrary finite-dimensional Heisenberg algebras 
coming from a nondegenerate alternating form. It fails spectacularly in infinite dimensions. 
In this case there are uncountably many inequivalent representations; see, e.g., BARTON 
[24] for an (in spite of the title of the book) elementary discussion of these. Their existence 
is one of the main stumbling blocks for extending quantum mechanics to quantum field 
theory. 



14.6 Bargmann— Fock representation 

We present an important but easy representation of the Heisenberg algebra h{n), which 
will be useful to us when we study coherent states in Section 14.7. Consider the vector 
space of complex polynomials in n variables C[zi, . . . ,Zn]. We then identify and al with 
the operators defined by^ 

{akp){zi, ...,Zn):^ Zkp{zi, ...,Zn), 

and 

d 

{alp){zi, ...,Zn):^ ■ ■ ■ > ^n) ■ 

OZk 

It is easy to check that this indeed defines a representation. We can even make a unitary 
representation out of this. For that purpose we consider the vector space H of all entire 
functions on C" with finite norm with respect to the inner product 

(/l^)= / 7{z)9iz)e-'-' ■ 

The space H with the above inner product is a Euclidean space; its closure is a Hilbert space, 
the multi-dimensional version of the Bargmann-Fock space described in Section 14.5. 

The operators and al are adjoints of each other. An orthogonal basis is given by the 
monomials: 

n 

{z1\..zM\..z^-) = l[k\S,a.. 

i=l 

From the discussion in Section 10.1 it follows that the quadratic expressions modulo the 
linear expressions in the elements Zi and ^ form the Lie algebra sp{2n,C). Taking all 

^Remember the transformations ak = and = . 
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quadratic expressions (so not modding out by the linear polynomials) in the elements Zi 
and ^ one obtains a central extension of isp{2n). 

The above representation is irreducible (one sees rather quickly that starting with 1, acting 
with pk gives all entire functions) and is called the Bargmann-Fock representation. By the 
Stone- Von Neumann theorem, which says that there is only one irreducible representation 
of the Heisenberg algebra, the Bargmann-Fock representation is up to isomorphism the 
only irreducible representation of the Heisenberg algebra. 

We have seen in Section 10.1 that the quadratic expressions (modulo linear terms) in the 
Qi and the Pk rotate the generators Qi and Pk into each other under the action of the Lie 
product. In other words, the action of the quadratic expressions builds a representation of 
sp{2n, C) inside the Bargmann-Fock representation. That this happens is not so strange. 
Let us consider the automorphism group of the Heisenberg algebra, consisting of all the 
invertible maps h{n) — > h{n) preserving the Lie product. But from equation (10.1) we see 
that the automorphism group contains the group Sp{2n,C). Now let us denote the above 
given Bargmann-Fock representation by U : h{n) — > Lm{H), then using Sp{2n, C) we get a 
new representation of the Heisenberg algebra as follows. For each g e Sp{2n, C) we consider 
the representation 

Ug : h{n) Uu{H) , Ug{x) = U{gx) . 

Since Sp{2n, C) C Aut(/i(n)) the Ug are indeed representations. But the unitary irreducible 
representation of h{n) is unique, up to isomorphism, and hence there must be a unitary 
operator R{g) such that 

Ug = Rig)URig)-'. 

It is clear that the R{g) are determined up to a sign. Thus R{g)R{h) = ±R{gh) and 
we say that the R{g) form a projective representation of the group 5*^(2^, C). This 
representation is called the metaplectic representation. The operators R{g) themselves 
form a group, closely related to the metaplectic group Mp(2n, M), the universal covering 
group of the Lie algebra sp(2n,]R). The metaplectic group is a two-fold cover of Sp{2n,'R), 
hence has a center of order 2, while our group has the multiplicative group of the reals as 
center. Factoring out the positive reals leaves the metaplectic group. 



14.7 Coherent states for the harmonic oscillator 

Coherent states were introduced in 1963 by Glauber [93], who recognized their importance 
in quantum optics; he received in 2005 the Nobel prize for his work in this direction. But the 
notion of a coherent state (without the name) was already introduced by Erwin Schrodingcr 
[223] in 1926 when he was looking for solutions to the Schrodinger equation that satisfy the 
Heisenberg uncertainty relation 

ApAq > ^, (14.20) 

where Ax denotes the variance of a quantity x. Schrodinger was looking for states that were 
as classical as possible, having equality ApAq = |. The coherent states, and only these 
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satisfy equality; they therefore build a bridge between classical physics and quantum physics 
that deepened as the notion of coherent states was extended to more general situations. 

To introduce Glauber's coherent states, we remind the reader that for the harmonic oscil- 
lator we constructed the Fock space M. oi t/j — {ipk)k>o satisfying 

5^-V^,Vfe<oo. (14.21) 

k=0 

One may regard i/j either as a vector with infinitely many components, or as an infinite 
sequence. Equivalently, in Dirac's bra-ket notation, the ip^s arc the complex coefficients in 
the expansion of ip with respect to an eigenbasis \k) of the number operator, ip = "^ipklk)- 
The inner product is given by 

CO ^fc 



k\ 

k=0 

The operators a, a* and n act as {aip)k = hipk+i, {(i*ip)k = kipk-i and (n'^)fc = kipk- We 
now define a coherent state for the harmonic oscillator to be a vector of the form 

\X,z) := (X,Xz,Xz'^,. . .)] {X,zeC) 

in other words, a state t/j with coefficients = Xz'^. By (14.17), we can write 

oo _^ 
k=0 

where is the ground state. Even more, we have 

\X,z) = Y,>^z'\k), (14.22) 

ik>0 

and we see that e EI since 

\y\2k 

fc>0 



The inner product between two coherent states is given by 



°° f,k ik-^k 

(A', ^'1 A, z)=Y, ^^^>^'^ = -^'^e"^' 

k=0 



It is easy to see that 

(A, z\n) - 



ni 



Suppose ip is an element of H, then 

oo / 

A;! 



(P, \k 

(A, ^IV') - A ^ ^ X^l^iz) , (14.23) 



fc=0 
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which defines the function ip{z) corresponding to ip. Conversely, given an analytic function 
9 

k>0 

"^^^ Ylk>o ^ ^kt^ < oo we assign to g the element t/jg — {gk)k>o in We claim that 
ip iIj{z) is a map from EI to the set of analytic functions. In order to prove the claim we 
have to prove that the power series (14.23) converges everywhere. We calculate the radius 
of convergence R 

R — lim sup — == = lim sup // > oo 

since ijjk satisfies (14.21). Hence the function xjj{z) is analytic everywhere. The state ijj is 
uniquely described by the function ip{z) in the sense that ip{z) — ip — 0, since 



jk 



z=0 



A ■ 



The inner product between ip and ijj now becomes 



«=0 I I 



So we can use the powerful theorems of complex analysis to deal with the states in the 
Hilbert space H. For the relations between complex analysis and coherent states, including 
important generalizations to coherent states associated with other Lie groups, see Perelo- 
Mov [193], Upmeier [247], Faraut & Koranyi [75]. 

Every element in H is a linear combination of coherent states, but the combination is in 
general not unique. For the harmonic oscillator a set of finitely many coherent states |A, z) 
with different z is linearly independent, since suppose 



n 

E 

i=l 



then it follows that 



for all w. But a finite set of exponential functions is linearly independent. Hence it follows 
that a finite set of coherent states |A,2;) with different z is linearly independent. The set 
of linear combinations of finitely many coherent states is dense in H. The coherent states 
form a kind of a "basis", but an overcomplete set. Such a set is called a frame. Frames 
are widely used in wavelet analysis. 

We can get a so-called tight frame by taking the coherent states with unit norm. A tight 
frame in a Hilbert space H is a set of vectors {fo-jg.gs in H, where a is some index set. 
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such that any vector / admits a unique expansion V = c^^v^. To demonstrate that the 
coherent states build a tight frame we simphfy the notation and put ^ = 1. Next we define 



\z) = \e 2l^l ^z) . 



The vectors \z) have a unit norm; {z\z) = 1. We calculate for an element |/) = "^^fkl 
the following 

\z){z\f) = J2e-\^^''z-z'^^\n). 

k,n 

The coefficient of each component \n) equals 
where f{z) is defined as 



From the above discussion we know that f{z) is analytic everywhere. Thus the vector 
ip\n) is of finite norm; 

\,n |2 U|2n 



lb. lb. 

n n 

Hence \z){z\f) represents an element in EI and we can integrate each component to get 

- f \z){z\f)d'z ^- I \z)f{z)e-h^\'dz = I/) , (14.24) 
where the integration measure is dz — d{R.ez)d(hn.z) and where we used 



= T:n\5n,m ■ (14.25) 

c 

In physics literature one writes the result (14.24) as 

— / = 1 . 

J 

In mathematics, such an expression is called a resolution of the identity. The fact that 
the coherent states admit a resolution of the identity makes them useful. We now wish 
to show that the expansion of |/) in coherent states is unique, thereby proving that the 
coherent states make up a tight frame. We use (14.24) to compute the inner product of |/) 
with a coherent state {w\ 



(^^1/) = le-l'"-!' / e'''-^'^' f{z)dz. 

Jc 



But / is an analytic function, so we first try / = z". Using (14.25) we obtain the identity 



Jc 
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Hence we derive the more general identity for analytic functions 



-/ 



'C 

from which we obtain 

/H=e^H^(^|/). 

Hence the expansion of / in coherent states is unique, since if / = 0, then all the {z\f) = 
and the expansion vanishes identically. Note that the above discussion only works for 
analytic functions /. If we admit a non-analytic / we get for example 



Jc 



for all n > 0. There is a relation between coherent states and the Hilbert space He of 
analytic functions / : C — > C such that 



i 



\ f {z)\^ e-^'^' dz <oo, 



for which the {z'^)n>o form a basis. We refer the interested reader to Glauber[93], Se- 
GAl[226], Bargmann[20, 21]. We just remark that if / e He and expand / as 



fnZ" 



n>0 



m = E 

then 

/ \f{z)\'e-\^\'dz ^ f y ^"^7y-" e-I^Pd^ 

^ n\ 

n>0 

Hence / defines an element in the Fock space H. (We have assumed one can change the 

order of integration and summation, but that can be made rigorous, see Bargmann [20] 
for a readable explanation.) The above discussion on the uniqueness of the expansion of 
an analytic function in terms of coherent states was taken from Glauber [93], which is a 
very readable account on coherent states and the physics and mathematics behind them. 

We now reinsert the constant h to see some of the behavior of the coherent states. Remember 
the formula for q 

q = 2ujm 2 (a + a*) . 
We see easily that a\X,z) = /i^[A, z). With a bit more work we see that 

(A, .[a*|A, z) = Y: \M'%ra\a^rH = z\\\' ^ ' 

k,l ' ' k=0 



14.7. COHERENT STATES FOR THE HARMONIC OSCILLATOR 



289 



from which it follows that 

(A, ^|a*|A, z) — z{X, z\X, z) . 
We thus see that we can associate the real part of z with the position; 

_i 

(A, z\qW z) = 2ujm '^{z + z*){\ zW z) . 

In a similar fashion the imaginary part of z is related to the momentum p. Let us pause for 
a while to see what the above means. The harmonic oscillator has a very symmetric shape. 
One can show that the wave functions which are eigenvectors of the number operator n 
respect the symmetry q — > —q in the sense that if g — >■ —q then they change with a factor 
(— 1)" (see any introductory book on quantum mechanics, e.g., Griffiths [99]). This 
means that for all wave functions that are eigenfunctions of the number operator the average 
position is precisely in the middle, at go- Therefore the momentum has zero expectation 
value. The coherent states represent shifted states; their position is not in the middle. Note 
that we have used the Heisenberg picture were the states are time-independent. To see the 
time-dependent behavior of the coherent states, we consider the product 

e-'^*|A,z). 

If we now shift the lowest energy to zero (that is, we choose Vq — and use (14.22) 

and 

we see 

e-^f*|A,^) = Wze-'""^). 

The coherent states thus swing from left to right in the potential with a frequency u and 
with amplitude \z\/2mw. 

In order to see the action of the Heisenberg group on the coherent state, we calculate 

(e"»fc = e'^^'^Pk e^'^lX, z) = |A, e'^z) . 
Prom a| A, z) — hzW z) it follows that 

e°"|A,z) = e^'"~\\z) = \e^'"~X,z) . 

Further, we have 

e \\z)^2^ ^, X—^ij = 2_^X—^{a)ij^\X,z + a}, 
and also we have 

e"\X,z) = |e"A,^) . 

We summarize this and write 

e""|A,z) = \X,e"z), e'^'lX, z) = {e^'^'X, z) (14.26) 
e°'^*|A,^) = \X,z + a), e"|A, = |e"A, ^) . (14.27) 
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We can apply arbitrary group elements by taking products. 

The Glauber coherent states introduced in the present section for the Heisenberg group, 
can be generalized. Indeed, the concept of coherent states extends to a large class of Lie 
groups acting on so-called co-adjoint orbits of the group. In each case, the co-adjoint orbit 
provides a manifold of labels for the coherent states on which the group acts, and the 
coherent states span in an overcomplete fashion a Hilbert space on which the group acts as 
an irreducible highest weight representation. We shall discuss highest weight representations 
in Chapter 16, but cannot give details for the general case mentioned here. Instead, we 
refer the reader to the book by Perelomov [193] and to the extensive survey by Zhang 
et al. [269]. 

14.8 Monochromatic beams and coherent states 

As indicated in Section 2.6 for the case of a beam of monochromatic light, the modes 
of the electromagnetic field play the role of the annihilation and creation operator of the 
quantum field. Classically the observables are functions on phase space, hence specified to 
a certain observable we have an operator on the configuration space. Namely a physical 
configuration is specified by giving the values of the observables, and to any observable we 
assign the operator that reads off the value of that observable. Thus if a configuration of a 
laser beam, which we suggestively denote |E), is specified by an electric field Fi{x,y,z,t), 
then the operator £{x, y, z, t) reads off the values of the components of the electric field at 
the space-time point {x, y, z, t): 

S{x,y,z,t)\E)=E{x,y,z,t)\E). 

In the transition from classical mechanics to quantum mechanics the role of the operator 
S{x, y, z, t) is played by the operator's positive frequency part of the electromagnetic field. 
The above equation then tells us that |E) is an eigenvalue of the annihilation operator. 

In a classical system there are many photons and the number of photons need not be 
constant, due to absorption and due to the constant photon production of the laser. Hence, 
from a micromechanical point of view, the quantum number n is no longer a good quantum 
number to assign to a system resembling a laser. However we know that the electric field 
is nearly perfectly constant, and if the beam goes in one direction we can take the classical 
expression 

E{x,y,z,t) = a*u(x) , 

for the electric field. Since the classical state of the laser has a well-defined value of the 
electric field, the quantum state |E) that mimics the classical state the most is the one 
where 

a*u(x)V^'=*|E) = E{x,y,z,t)\E) . 

But then |E) is an eigenvector of a*. This similarity between coherent states and classical 
states is what motivated Roy Glauber to investigate coherent states and apply his analysis 
to the (quantum and semiclassical) theory of light. 
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All this extends with suitable modifications to the other wave equations described in Sec- 
tions 2.5 and 2.4. In each case, there are families of coherent states describing nearly 
classical ray-like behavior, and there are more exotic quantum states which behave quite 
unlike any classical system. 
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Chapter 15 

Spin and fermions 



This chapter discusses the quantum mechnaics of spinning systems, where the only relevant 
degrees of freedom correspond to rotation. 

The quantum version of the spinning top discussed in Section 9.4 can be obtained by 
looking for canonical anticommutation relations, which naturally produce the Lie algebra 
of a spinning top. As for oscillators, the canonical anticommutation relations have a unique 
irreducible unitary representation, which corresponds to a spin 1/2 representation of the 
rotation group. The multimode version gives rise to fermionic Fock spaces; in contrast 
to the bosonic case, these are finite-dimenional when the number of modes is finite. In 
particular, the single mode fermionic Fock space is 2-dimensional. 

Many constructions for bosons and fermions only differ in the signs of certain terms, such 
as commutators versus anticommutators. For example, quadratic expressions in bosonic or 
fermionic Fock spaces form Lie algebras, which give natural representations of the universal 
covering groups of the Lie algebras so{n) in the fermionic case and sp{2n, M) in the bosonic 
case, the so-called spin groups and metaplectic groups, respectively. In fact, the analogies 
apart from sign lead to a common generalization of bosonic and fermionic objects in form 
of super Lie algebras, which, however are outside the scope of the book. 

Apart from the Fock representation, the rotation group has a unique irreducible unitary 
representation of each finite dimension. We derive these spinor representations by restric- 
tion of corresponding nonunitary representations of the general linear group GL{2,C) on 
homogeneous polynomials in two variables, and find corresponding spin coherent states. 



15.1 Fermion Fock space 

As we have seen in Section 9.4 the affine functions in the Poisson algebra of the spinning top 
make up the Lie algebra ^(2). One can thus expect that the quantization of the spinning top 
boils down to representation theory of su{2) and u{2) and indeed it does. In the following 
sections the representations of su{2) and u{2) play an important role. See for example 
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Humphreys [117] or Jacobsen [121] for a comparison of the methods used. 

In this section, however, we look at a particular representation, the Fock representation of 
u{2). It behaves in many respects hke the Fock representation of the Heisenberg algebra, 
and gives the right generalization to the case of many fermionic modes, and in particular 
to quantum field theory. 

In fact, there are many analogies between bosonic and fermionic systems - many formulas 
look alike, apart for the occurrence of additional minus signs in certain places.^ Although 
very similar in many respects, there is a fundamental difference with basic representation 
theory of bosons and fermions. While bosons are characterized by canonical commutation 
relations, fermions are quantized using canonical anticommutation relations. We shall see 
in a moment that this naturally reproduces the Lie algebra u{2) of a spinning top, and - 
just like for canonical commutation relations - uniquely fixes the representation. 

We define a signed commutator 

[f,9]± — fgTgf; 

the upper sign applies to 'bosonic' quantities /, g, and reproduces the ordinary commutator, 
[f,g]+ = [f,g], while the lower sign applies to 'fermionic' quantities f,g, and reproduces 
the anticommutator 

[f,9]- = fg + gf ■ 

Often, the anticommutator is written instead as {/,(?}, which looks like a Poisson bracket, 
so that we don't recommend this notation. In the theory of Lie super algebras, the sign at 
the commutator is not written at all, since the context already determines the nature of 
the arguments, and hence implies which sign is implied. 

To understand how anticommutators give rise to the u{2) Lie algebra governing a spinning 
top, we impose the canonical anticommutation relations on operators a and a* — (a)* 
in some Hilbert space 

[a,a*]_ = h, [a, a]_ = 0, [a*,a*]_ = 0. 

In particular we have = {a*y — 0. The algebra E spanned by 1, a and a* is four- 
dimensional since these generators together with aa* — a*a already span E. Hence E is 
isomorphic to the algebra of complex 2 x 2-matrices; an explicit isomorphism is obtained 
by identifying a and a* with the matrices 

(o J) ' (? o) ' 

respectively and aa* — a*a with ia^. The Lie *-algebra L described by 1, a, a* and [a, a*] is 
thus u{2). Thus the anticommutation relation automatically produce the right Lie algebra 
for a spinning top, and u{2) is the fermionic analogon of the oscillator algebra os{l). 

^To see how this leads to the vast mathematical area of superalgebras and supergeometry we refer the 
interested reader to for example Varadarajan [252], Scheunert [221], Tuynman [245], Deligne et al. 
[72] and references. 
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We can get the same result more formally on the quantum level in a way which is completely 
analogous to the bosonic case, by considering an arbitrary unitary representations of the 
canonical anticommutation relations, i.e., for linear operators a and a* satisfying these 
relations. We introduce the operator 

n :— h~^a*a , 

and we let ip &Vx for some Verma module Vx; as in 
obtain 

naip — h~^a*aaip 

and hence at/j e Vq. Further, 

na*tl! = h''^a*aa*il; — a*il) , 

and therefore a*ip & Vi. To compute A we proceed 

h'^ri^ip — a*aa*ail> — ha*ail> — h^nip 

and hence — A = 0, from which we deduce A = 0, 1. Thus we have arrived at the re- 
markable conclusion that the canonical anticommutation relations lead to two-dimensional 
Hilbert spaces. 

We take a basis vector |0) in Vq and we define |1) = a*|0). The Lie *-algebra L acts on the 
space spanned by |0) and which is isomorphic to C^. This representation is called the 
Pauli representation. In the above we have thus shown that the Pauli representation is 
the unique irreducible unitary representation of the canonical anticommutation relations. 
This is the fermionic analogue of the Stone-von Neumann theorem. 

In analogy with the boson case, the representation space is called the single mode 
fermion Fock space. 

We shall see in Section 15.5 that the irreducible representations of su{2) are in one-to- 
one correspondence with the (finite) dimension of the representation space. For historical 
reasons, this dimension is usually denoted by 2j + 1, and j is called the spin of the repre- 
sentation. Clearly, the spin j is half a nonnegative integer. In particular, the single mode 
fermionic Fock space has dimension 2 and hence spin j = 1/2. 

In general, cf. Section 10.5, elementary particles are associated with an irreducible repre- 
sentation of the Poincare algebra (or in the nonrelativistic limit the Galileo algebra), which 
is characterized by mass and spin. The spin assigment in these representations is such 
that, in the massive case, the restriction to a center of mass frame at a fixed time gives an 
irreducible representation of the Lie algebra so(3) = su{2) of the same spin. (The massless 
case is not related to u{2).) 

Elementary particles of integral spin (bosons) are represented by a bosonic Fock space, 
those of nonintegral spin (fermions) by a fermionic Fock space. This fact is a consequence 
of the so-called spin-statistics theorem which holds under certain causality assumptions 
related to Poincare invariance of a field theory. Fermionic particles obey the Pauli exclusion 
principle (Pauli [189], Schwinger [225], Streater [237]). 



Section 14.3, this means nip — Xip. We 
= 0, 
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15.2 Extension to many degrees of freedom 



Suppose that the algebra of linear operators on some vector space EI contains, for some 
linearly ordered set^ M of labels, quantities Uj and a* {j e M) satisfying the relations 

[dj, ak\± = [aj, a*k\± = 0, [aj, a^]± = 5jk , (j, k E M). (15.1) 

For the upper sign (the bosonic case), these are just the canonical commutation relations 
defining a Heisenberg algebra corresponding to harmonic oscillators with finitely many 
degrees of freedom. For the lower sign (the fermionic case), the relations (15.1) generalize 
the canonical anticommutation relations which we have met for the spinning top; we thus 
expect to get an analogue of the spinning top with n degrees of freedom. 

In this section, we consider the fermionic case. We first assume that we have a imitary 
faithful representation and deduce enough properties that determine the representation 
uniquely. Then we use the properties deduced to construct the representation. 

The canonical anticommutation relations imply that 

OjOfe = -OfeOj , a*al = -ala* , ajal = 5jk - alaj , (15-2) 

and again we have in particular a| = {'^ff — 0- To find the unitary representations of 
physical interest, we assume - in analogy to the bosonic case of the canonical commutation 
relations - the existence of a nonzero vector -00, the ground state, such that 

ajVo^O foralljeM. (15.3) 

We next define for any finite set J — {ji, . . . , ji\ of distinct labels ji < . . . < ji from M the 
vectors 

I J) |ji . . .ji) - a*^ ■ ■ ■ a* |0) = • (15.4) 

Since we want a faithful representation, we may assume that |J) 7^ 0. Indeed suppose 
I J) = for J = {ji, ■ ■ ■ , ji}, then acting on | J) with aj,_^ • • • Oj^ we see that a*^ acts as 0. 
Because of (15.2), we have 

= ifje^. a-|7)=(^„,,,,.„ (15.5) 

where the sign £j( J) is defined to be +1 if there is an even number of indices in J that are 
smaller than j and —1 otherwise. 

15.2.1 Proposition. We iiave the following identities: 

ej{J\{j})^ej{J) forjeJ, (15.6) 

ejiJU{j})=ej{J) forj^J, (15.7) 

ej{J)ek{JU{j}) = -ekiJ)ej{JU{k}) forj,k^J, j^k, (15.8) 

6j{J)ek{J\{j}) = ^ek{J)e,{J\{k}) forj,keJ, j^k, (15.9) 

5,(J)£fc(JU {j}) = -ekiJ)e,iJ\{k}) forj^J,keJ. (15.10) 



set M is a linearly ordered if there is a binary relation < such that for all m,n,p & M: (1) m < m, 
(2) m < n and n < m then m = n, {3) m < n and n <p then m <p, (4) either m < n or n < m. 
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Proof. This is a straightforward consequence of the definition, taking into account when 
Sj{J) and Ski J) change sign if an index is removed or added to J. □ 

We define F_(M) to be the vector space spanned by the | J). By definition, F_(M) consists 
of the finite hnear sums of the elements | J) . 

15.2.2 Proposition. 

(i) The vectors \ J) are linearly independent. 

(a) The vector space F_(M) is an irreducible representation space for the canonical anti- 
commutation relations. 

Proof, (i) Suppose that we have ^^jCjIJ) = with finitely many nonzero coefficients, and 
let J = {ji, . . . ,ji} {ji < . . . < ji) be a set of maximal size among the sets with cj ^ 0. In 
view of (15.5), multiplication by aj^ ■ ■ -aj^ leaves as only nonzero term ±cj|0) = 0. Since 
the ground state ipo is nonzero, we conclude that cj — 0, contradiction. Therefore, the 
vectors (15.4) are linearly independent and form a basis of F_(M). 

(ii) Equations (15.5) imply that aj and a* map F_(M) into itself. Irreducibility of the 
representations follows since the same argument used in (i) implies that any invariant 
subspace of F_(M) containing a nonzero element contains the ground state, hence all | J), 
and hence all elements of F_(M). □ 



Since the | J) form a basis of F_(M), we may identify a vector i/j G F_(M) with the fermion 
wave function ip defined on the finite subsets of M whose value at J C M is the coefficient 
ip{J) in the basis expansion 

J]^(J)|J), 

JCM 

where the summation is over all finite subsets J oi M. Note that only finitely many 
coefficients ip{J) are nonzero. 

15.2.3 Proposition. Under the assumption (15.3), the anticommutation relations (15.2) 
imply that the linear operators 

a{u) := ^^^UjGj, a*{u) := '^^Uja*, 
jeM jeM 

defined for all vectors u indexed by M which have only finitely many nonzero entries, act 
on fermion wave functions according to 

{a{umj) = j2^AJhMJ^{j}) , i^*inmiJ) = 5^^,(j)«,V'(j \ m . (15.11) 

Proof. We have 

aji^ ^Y.'l^{J)a,\J) = ^^(J)£,(J)|J\ {j}) = J2'^{JU{j})e,{JU{j})\J), 

J J3j J^j 
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and using (15.7), we find 

Taking linear combinations proves the first assertion. Similarly, 

a*^' = V(J)SV) ^J2'l'{J)ej{J)\JU{j}) ^^^|^{J\{j})sJiJ\{j})\J) 

J^j J3j 



Using (15.6), we find 

(a*V^)(J) = 

Taking linear combinations proves the second assertion. □ 



£,(J)^(J\{j}) ifjeJ, 
ifj^J. 



For a unitary representation, we need that aj is the Hermitian conjugate of a^, which is 
the case if and only if the | J) are orthonormal. Indeed, suppose J ^ J', then we may 
assume there is j G J that is not in J' (else turn the role of J and J' around). But then 
(J|J') — {J\ {jyidjlJ') — 0. Hence we may assume that F_(M) has the inner product 

0>-5^0(J)V^(J). (15.12) 
J 

To show that unitary representations with the desired conjugation and anticommutation 
relations actually exist, we start with the space F_(M) of complex valued functions ip 
defined on finite subsets of an arbitrary set M such that only finitely many values il'{J) are 
nonzero. Then (15.12) defines an inner product on F_(M), and the completion F_(M) of 
F_(M) in the associated norm is a Hilbert space, called the fermion Fock space over M. 

For a concise formulation of the result, we use a shghtly more abstract notation. We 
introduce the Euclidean space H of vectors indexed by M with finite support, equipped 
with the bilinear form 

u^v := ^ UkVk, 

keM 

and write F_EI := F_(M). In applications to quantum field theory, EI becomes the infinite- 
dimensional single-particle Hilbert space, and the sums become integrals over momentum 
vectors, but the formulas below remain valid with an appropriate interpretation. 

15.2.4 Theorem. The relations (15.11) define two linear mappings a, a* from M to the 
algebra Lin F_1HI, and we have 

{a{u)y ^a*{u), (15.13) 

a{u)a{v) + a{v)a{u) ^0, a*{u)a*{v) + a*{v)a*{u) = , (15.14) 

a(u)a*(v) + a*(v)a(u) = u^v. (15.15) 

In particular, taking for u, v vectors with a single nonzero entry, we find the canonical 
anticommutation relations (15.2). 
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Proof. Prom the definitions (15.11) and (15.12), we find 

fa*{u)^|; = J]0(J) J]£,(J)«,^(J\ {j}) • 
J jeJ 

Renaming the J \ {j} to J, we get in view of (15.6) and (15.7) 

J j^J 

This imphes (15.13). To prove (15.14), we note that in view of (15.8), 

{a{u)a{v)^l:){J) = J2 J2 ^AJ)^k{J ^ {j})ujVkip{J U {j,k}) 

= -E £ £k{J)£j{JU{k})ujVkij{JU{j,k}) 

= -{a{v)aiu)i;){J) . 

This proves the first formula in (15.14), and the second formula follows with (15.13). Finally, 
to prove (15.15), we note that 

{a{u)a*iv)^l;)iJ) = J2 E U {j})M,t;,^( J U {j} \ {fc}) , (15.16) 

j^J keJu{j} 

and 

{a*{v)a{u)^|;){J) = J2 E £k{J)sj{J \ {k})ujV,il;{J \ {k} U {j}) . (15.17) 

k&jj(j\{k} 

The sets of pairs (j, k) over which the summation in the two equations is taken, contain the 
pairs with j ^ J,k ^ J, for which the signs are opposite by (15.10). Thus the corresponding 
terms in the sums cancel when the two equations are added. The remaining terms consist 
in (15.16) of the terms with k — j ^ J and in (15.17) of the terms with j = k E J; for 
the corresponding terms in the sums, all signs are +1. Therefore, adding the two equations 
results in 

{a{u)a*{v)^){J) + {a*{v)a{u)^){J) = ''Y^UjVjip{J) + '^^UkVk'ipiJ) — u^vjp{J) . 

j^J k&J 

□ 



15.3 Exterior algebra representation 

We now show another important realization of the anticommutation relations, equivalent to 
that of the preceding section but phrased in a different language familiar from differential 
geometry. 
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For any vector space EI we consider the tensor algebra ^EI = C©H©(]HI®IHI)..., which 
is an associative algebra with unity where multiplication is the tensor product. We define 
the ideal Jip to be the ideal generated by the elements v ® w ^ w ® v for v,w e H. The 
quotients 

h/j_ = Yh, h/j+ = /\h, 

that we obtain by dividing out by the ideals Jip are equipped with a natural algebra struc- 
ture, since we divided out by ideals. We call V H the symmetric algebra and /\ H the 
exterior algebra. The product in the exterior algebra is written as f A w in place of vw, 
and is then called the exterior product or wedge product. The product A satisfies the 
anticommutative law 

V Aw = —w A V 

for any two vectors v,w but not for general elements of/\EI; uAvAw = wAuAv. 

The symmetric algebra leads to a representation of the canonical commutation relations 
(see Section 14.6); the exterior algebra to one of the canonical anticommutation relations. 

We concentrate on the latter, and restrict to finite-dimensional vector spaces. If H is a vector 
space of finite dimension n we may choose a basis ei, . . . , e„. Using the anticommutation 
relations one easily verifies that the exterior algebra /\EI has a basis consisting of the 
elements 

1 ; Ci] Ci A e-j {i < j) ; Ci A Cj A Ck {i < j < k) ; ci A 62 A • • • A e„ , 

making a total of 2" basis vectors. Thus the dimension of /\1HI is 2^^. We now introduce 
operators given by 

ak{u!) — Ck Acu for a; e y\ H ; 

in particular, 0^(1) = e^. Similarly we define operators al as follows: On elements of the 
form cu — Ck A cu' we put 

al{u;) = aKckAu;') = u' 
and if we cannot write cu into the form cu — Ck Acu' we put 

a*H = 0. 

We now show that we have 

akol + a*iak = 5ki . (15.18) 

First we assume that we cannot write uj in the form A a;' and neither in the form ei Au>". 
In this case we have 

ak o ai{u;) + o ak{u;) = a^{ek A a;) = Skico ■ 
If we can write a; as A a;' but not as e; A cu" then we have k ^ I and 

Ofc o a^*(a;) -|- a^* o 0^(0;) = a^* o ak{ek A u)') = . 
li k — I and uj — Ck Auj' we have 

Ofc o 4 + o afe(efc A J) = 0^0 al{ek Au')^u. 
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In the last case, when k ^ I but we can write u — f\ ei we have 

Ofc o a*i{uj) = aka*i{-ei A Cfco;') = 

and 

al o akioj) = a*i o afc(efc A e/ A a;) = . 
Putting it all together we indeed have (15.18). 

15.4 Spin and metaplectic representation 

In analogy to the bosonic case treated in Section 10.1, we now show that quadratic ex- 
pressions in anticommuting operators a and a* make up well-known finite-dimensional Lie 
algebras, in this case the orthogonal algebras so(2?t,) and so{2n + 1). 

The method of derivation is difi^erent however. It works for bosons and fermions simulta- 
neously, with differences only in certain signs, and gives in the bosonic case a construction 
of the metaplectic representation of sp(2n, R) and the central extension of isp(2n, R). In 
the sequel, the upper signs apply for the bosonic case, and the lower signs apply for the 
fermionic case. We use coordinate-independent notation, so that the method can be taken 
over almost literally to the infinite- dimensional case. 

We assume that we have a linear mapping a : EI — > E that assigns to each a from some 
vector space H an element a{a) in an associative algebra E with identity 1 such that 

[a{a),a{P)]± = a{a)a{P) T a{P)a{a) e C . (15.19) 

For example, with the standard generators Ofc, in a bosonic or fermionic Fock space, we 
can take 

k 

For the bosonic case, (15.19) means that the Lie algebra that is obtained by equipping E 
with the commutator as Lie product, contains a central extension of a commutative algebra. 

The ground state on E is a positive hnear functional (•) that satisfies (1) = 1. Linearity 
implies that there is a linear operator G satisfying 

{a{a)a{l3)) = a^G(3; 

(15.19) then implies that 

a{a)a{(3) =F a{(3)a{a) = a^jp . 

where 

In Section 10.1, the bilinear form u was represented by an antisymmetric nondegenerate 
2n X 2n-matrix. This is the most interesting case, although in the first part of the discussion 
below J can be degenerate. 
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Any quadratic expression in the and is a sum of terms of the form a{a)a{f3); this is 
nothing else than the statement that any matrix is a sum of matrices of the form M — mn^ 
for some vectors m and n. We define the quadratic expression^ 

N{a(5'^) := ^ [a{a)a{(3) - a^Gf3^ . 

and extend them by hnearity (using a basis it is easy to see that the extension is unique 
and well-defined). We thus have {N{f)) — for all quadratic expressions /. We also have 

a{a)a{(3) = 2N{a(3^) + tr(G'^a/5^) . 

Remember that in Section 10.1 we considered the symmetric combination 2Eij = Piqj + qjPi- 
Motivated by this we restrict our attention to / = Y2i ^i0i such that = ±/. For a single 
term / = ajS'^ = ±/-^, we find 

2[7V(/),a(7)] = a{a)a{P)a{^) - a{^)a{a)a{P) 

— ±a{a)a{'y)a{(3) + a{a)0^ ^ a{a)a{'-))a{l3) — a{l3)j^ Ja 
= a{a(3'^J'y) -a{(3a^J^-f) 
= a{fJ^-fj^^), 

so that by linearity, 

[7V(/), a(7)] = Ar(/)a(7) - a{^)N{f) = a(/ J7) 

for / = ±/-^. Similarly for g = = we find 

2[N{f),N{g)] = [N{f),a{jMS) + a{j)[N{f),a{S)] 

= a{fJ^)a{6) + a{-f)a{fJ6) 

- 2N{fJ^6^) + tT{G^fJ^6^) + 2Ni^ifJSf) + tr{G^^{fJSf) 

= 2N{fJg-gJf) + tr{G^{fJg-gJf)), 

so that again by linearity, 

[N(f),N(g)] = N(fJg-gJf) + ^tr{G^{fJg-gJf)) 
= N{fJg-gJf) + Ur{{G±G^)fJg). 

Writing 

(s,a,f)^N(f) + a{a) + sl, 

we find for the bosonic case 

[(s, a, /), (s', a', /')] = (i tr{{G + G^)fJf) + a'Ja', fJa' - fJa, fJf - fJf) 

^In infinite dimensions, this amounts to a renormalization step that is conventionaUy described as 
"subtracting infinite constants" arising at a later stage of the development. Our formulas are renormalized 
from the outset, and no infinite constants arise. 
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and for the fermionic case 

[(s, a, /), a\ /')] = {S{f, /', a, a'), /^«' - /'^«, fJf - f'Jf + ^{c^c.'^ - «'«^)) , 
where 

S{f, a, a') = i tr ((G - G^)/ J/') + Ja'. 

Fermionic case. To exploit these formulas, we first focus on the fermionic case, and 
assume that J is nondegenerate; without loss of generality, we may choose J to be the 
2n X 2n identity matrix, J = 1. 

We consider the Lie algebra L defined by the quadratic elements modulo the constant term; 
that is, we factor out the center. We write {a, f) for the equivalence class of (s, a, /). The 
quadratic expressions / are antisymmetric, = — /, and thus correspond to so(2n, C). 
Let us consider the map : L — > sl{2n + 1, C) defined by 

The map u is injective and preserves the Lie product and thus is an isomorphism onto its 
image. The image under n of L is the Lie algebra so{2n + 1, C). It can easily be seen that 
matrices in the image satisfy 

since, for fermions, J is the 2n x 2n identity matrix. Restricting this basis to the real 
numbers we obtain the real form so{2n, 1). Summarizing, we have thus established that 
the quadratic elements (with center) form a central extension of so{2n + 1, C). The purely 
quadratic expressions (no linear and constant terms) form the Lie algebra so{2n,C). Note 
that the group 0{2n, C) is the automorphism group of the algebra defined by the relation 

bkbi+b;h = Ski. (15.20) 

Going to the 'real' basis ak = bk + bl, — i{bk — bl) we see that the real Lie group 0{2n) 
preserves the relations (15.20). 

For a finite number of generators, the canonical anticommutation relations have a unique 
faithful unitary representation. Therefore as in Section 14.6 we can say something interest- 
ing about the automorphism group of the algebra defined by (15.20). Performing a rotation 
bk ^ b'^ := YliQkibi with g = {gki) an element of S0{2n) on the generators bu we get another 
representation of the canonical anticommutation relation, but since this representation is 
unique, there exists a unitary transformation U (g) that relates the obtained representation 
with the original representation: 6^ = U{g)bkU{g)~^, where we simply wrote bk for the 
representation of b^.. Again, U (g) is not unique for a given g, since —U(g) also does the job. 
In this way we get a double cover of the group S0{2n), called the spin group Spin{2n), 
just as in Section 14.6 we obtained the metaplectic cover. 
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Bosonic case. For the bosonic case wc may proceed in an analogous way. Again, we 
assume that J is nondegenerate; this time, the normal form can be taken without loss of 
generality as an antisymmetric 2n x 2n-matrix J that squares to —1. 

We again form the Lie algebra L of inhomogeneous quadratic expressions and factor out 
the center. We then apply the map to the equivalence classes {a, /) 



u : (a,/) 

It is clear that 



Jf a 




{Jffj + J{Jf) = -f + f = 0, 

so that the map u is an isomorphism from L to isp{2n). We thus see that the inhomogeneous 
quadratic quantities form a central extension of the Lie algebra isp{2n). 



15.5 Spinor representations of GL(2,C) and SU{2) 

As a preparation for the construction of spin coherent states, we discuss the spinor repre- 
sentations of G'L(2,C), see also Sternberg [234]. By restricting to the unitary matrices 

we get unitary representations of the group SU{2). As we shall see later in Section 16.3, 
these representations comprise all irreducible unitary representations of SU{2). 

Each complex 2 x 2-matrix can be written as a complex linear sum of the Pauli matrices. 
We write (see also (1-23)) 

p • (T± = poO-Q =F P • , (15.21) 

where 

.^(;»).c-.3^cxc3, .,^(2) 

where a is the vector a = (ui, (T2, era). We note the formula 

(p • (7_) (p • (7+) = p • pcTo = det(p • (7±)(7o for p e C^'^ , 

where p-gisthe (H ) Minkowski inner product 

p-q:^ Pogo - - P2?2 - Pa^s- 

We have 

{p-a^Y =p-a±, 

so that p ■ a± is Hermit ian if and only if p is real- valued. Therefore we can identify the 
elements p ■ a± with u{2); 

u{2) = {p-a±\pe R^'^} , 

and letting p take complex values we get the whole of gl{2, C). We obtain su{2) for p real 
and po = 0. 
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For < s e we denote with the space of all homogeneous polynomials of degree 2s 
m z = {zi,Z2) G C^. The space has dimension 2s + 1 since the monomials z^z'^^^^{k = 
0, 1, . . . , 2s) form a basis of P^. The group GL(2, C) of invertible complex 2x2 matrices 
acts on in the natural way. On P^ we get an induced representation of GL(2,C) by 
means of the formula 

iU{g)iP){z) := iP{g-'z) for g e GL{2, C) . (15.22) 

Then indeed U{g)U{h)i/j{z) = U{h)h-^{g^^z) = i/j{h^^g-^z) = i/j{{gh)-^z) = U{gh)^{z). 
Taking infinitesimal group elements, we find that the Lie algebra gl{2,C) acts on P^ by 
means of the representation J defined by 

{J{A)ilj){z) = -{Az) -Vi/Jiz) , for A e 5/(2, C) . (15.23) 



The unitary case. By restricting in (15.21) to real-valued p, we represent su{2). The 
resulting representation turns out to be unitary. To give P^ the appropriate Hilbert space 
structure, we define on the unit disk 

D^{zeC^\z*z< 1} 

of the measure Dz by 

J Dzf{z\ z) = dz'f{z\ z) . (15.24) 
Explicitly we thus have Dz — dzidzidz2dz2, so that for example 

/ Dzz'lziz^Z^ = ^7r%i5mn f ;^2fc+1^2m+l 
J Jo<x2+y2<l 

= 7^^Skl5mn / sH'^dsdt 
Jo<s+t<l 



(A; + m + 2)! 
where in the last step we used 



(a + 6+1)! 

15.5.1 Proposition. Dz is an SU{2) invariant measure satisfying 

J Dz{z*xf\z^yf' = -is{x^yf', < s e ^Z, (15.25) 

where 

7, = 7rV(2s + l)(2s + 2). (15.26) 



306 



CHAPTER 15. SPIN AND FERMIONS 



Proof. Under a change of integration z' = gz we have dz[dz[ = det gdzidzi = dzidzi. 
Hence if we use for g G SU{2) the substitution z = gz', then the integral transforms in 
(15.25) into the same integral with {x', y') = {Ux, Uy) in place of (x, y). Thus it is invariant 
under SU (2) and depends therefore only on x^y. Indeed, we can always rotate x such that 
X — (xi,0) and then clearly the right-hand side is a polynomial with terms x\^y'^y2^~'^ , 
which is only invariant under the diagonal ?7(l)-subgroup if m = 2s. Hence the right-hand 
side of (15.25) is fixed up to the constant 7^, which is found by looking at the special case 

□ 



We make into a Hilbert space by giving it the inner product 

4>*i^ = {m - 77' j DzW}^{^) ■ (15-28) 

We introduce the basis vectors tt^*'' — z\z2^~^ for Pg, in terms of which the inner product 
reads 

(s), _ ^kl 



For X e C^, we define the coherent state |x, s) e P^ to be the functions 

|X, S) {z) = {X*zf' = (xiZi + X2Z2f' . (15.29) 

Then we can restate (15.25) as 

{x,s\y,s)^{y*xf' for x,?/eC^ (15.30) 

In particular, the coherent state \x, s) is normalized to norm 1 if and only if x has norm 1. 
Directly from (15.29), we see that 

|0,s)=0, |Ax,s) = A^'|x,s), (15.31) 

so that it suffices in principle to look at coherent states with x of norm 1. In particu- 
lar, choosing the parametrization x — (^) gives the traditional spin coherent states of 
Radcliffe [201]. For coherent states, (15.22) imphes 

U{g) \x, s) = \g-'^*x, s) for g e SL{2, C) , (15.32) 

Thus coherent states form a representation of GL(2,C); this is the spinor representa- 
tion of (7^(2, C). We verify that we correctly have U{g)U{h)\x, s) = \g^^*h^^*x, s) = 
\{gh)^^*x, s) = U{gh)\x,s). One sees easily that only the subgroup SU{2) is represented 
unitarily and we have 

U {g) \x, s) = \gx, s) for g G SU (2) . 
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Expanding {x*z)'^^ using the binomial series we obtain 



from which it follows 



Jd 

so that the coherent states span Pg. Prom (15.30) we find 

{x,s\y,s) ^ \y,s){x) (15.33) 

for all coherent states \y, s) and since these span P^, wc have for all ip 

{x, s\'^) — '4>{x) , for all e P^ . 

In general, let H be a Hilbert space of functions on some space f2. Suppose we can write 
the evaluation map ev^ '■ f ^ f{x) for aX\ x E Vt as eVx{f) = {gx\f) ^or some Qx in H, 
then we say that EI has the reproducing kernel property. The space P^ thus has the 
reproducing kernel property, which implies that we can reproduce elements as follows. For 
all e Ps we have 

{m = i:^ j DzW)^{^)^l7' j Dz{4>\z,s){z,s\xl^), (15.34) 

from which it follows that we can reproduce 

V' = 77^ j Dxil:{x)\x, s) for all e P^ . (15.35) 

Equation (15.35) implies the completeness relation 

.,./z,.|...)(...| = l. (15.36) 

Thus all properties familiar from coherent states are valid. 

We close this section by mentioning some further properties of spin coherent states. Because 
of the identity 

I - X, s) = (-l)^^|a;, s) 

fermions {s ^ Z) are called chiral. The center of 5'L(2,C) is the group Z2 = { — 1,1} 
and the quotient PSL{2, C) = SL{2, C)/{ — 1, 1} is isomorphic to the restricted Lorentz 
group 5*0(1, 3)+, which is defined to be the connected part of the group 5*0(1,3). In 
Section 15.6 we indicate why 50(1,3)"'" = PSL{2,C). Since fermions are chiral, they are 
not invariant under the Z2-subgroup of 5L(2, C) and thus fermions do not constitute a 
representation of the restricted Lorentz group. 

In quantum mechanics, an elementary particle with spin s is described by an element of 
the space L^(]R'^,Ps) of square integrablc mappings from M.^ to P^. The Hamiltonian for a 
particle with spin s in a magnetic field is given by 



Bs Bi + iB2 
Bi - 1B2 -Bs 
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where the action of a is given by (15.23). The dynamics is described by the Schrodinger 
equation 

ihip = Hip . 

Since H is an element of su(2), equation (15.32) impUcs temporal stability, which means 
that if the initial wave function is a coherent state, then under the time evolution determined 
by H the wave function remains to be a coherent state. Since the norm of the wave 
function is invariant under the dynamics, too, one can work with normalized coherent 
states throughout. 



15.6 The isomorphism 50(3, 1)+ ^ SL{2, C)/Z2 

We use the notation introduced in Section 15.5 and identify four-vectors p e B}'^ with the 
2 X 2-matrices p ■ (J+. For any four- vector p e K^'^ the Minkowski norm is given by 

det(p • (j+) = p ■ p . 

The group SL{2, C) acts on R^'^ through 

Ap-a+A*, for AeSL{2,C)- 

Clearly this defines for each A G SL{2,C) an element of 5*0(3, 1), and hence we have a 
map SL{2,C) 5*0(3, 1). The group 5L(2,C) is a real connected manifold of dimension 
6. Indeed, any complex 2x2 matrix has 4 complex entries making 8 real numbers. The 
constraint detA — 1 gives two equations, for the real and imaginary part, and hence 
removing two dimensions. 

Let us show that SL{2, C) is connected. For A e SL{2, C) we can apply the Gram-Schmidt 
proces to the column vectors of A. Looking at how the Gram-Schmidt procedure works, 
we see that any element of ^4 e SL{2, C) can be written as a product of an upper triangular 
matrix N with positive entries on the diagonal and a unitary matrix U G U{2). We can 
write U = e'^U' with U' G SU{2) making clear that U{2) ^5^x5^ so that U{2) is 
connected and the matrix U can be smoothly connected to the identity. For N we may 
write 



with ac — 1 and a > and c > 0. Then t ^ tN + {1 — t)l2x2 is a smooth path in 
GL{2, C) for t G [0, 1] that connects the unit matrix to N. Dividing by the square root of 
the determinant gives the required path in SL{2, C). Hence SL{2, C) is connected. 

The map SL{2, C) — > 50(3, 1) is a smooth group homomorphism and thus any two points 
in the image can be joined by a smooth path. Hence the image is a connected subgroup of 
50(3, 1). Since the dimensions of 50(3, 1) and 5L(2,C) are the same, the image contains 
an open connected neighborhood O of the identity (this is nothing more than the statement 
that the induced map sl{2, C) — > so(3, 1) is an isomorphism). But the subgroup of 50(3, 1) 
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generated by a small open neighborhood of the identity is the connected component con- 
taining the identity. Indeed, call G' the group generated by the open neighborhood O. We 
may assume := {g~^, g G 0} = O, since if not we just replace O by O n 0~^. If x E G' 
then xO C G' is an open neighborhood containing x so that G' is open. If x ^ G", then 
xO r\G' — since if y G xO fl G', then there is z E O such that = y, but then x = yz~^ 
lies in G'. Hence G' is an open and closed subgroup of the component that contains the 
identity, but then G' is the component that contains the identity. 

Hence the map SL{2, C) — > SO{3, 1)"^ is surjective and we only have to check the kernel. 
An element A is in the kernel if and only if Ap ■ a^A* = p • a+ for all p e R^'^. This is a 
linear equation in p so we may as well take p e C^'^. Choosing p • (7+ = iTi + i(72 and writing 



we find a6 = 0, ad = 1 and |c| = 0. Hence 6 = c = and ad = 1. But since det A = 1 we 
have ad = 1 so that a^ = 1 giving a = ±1. The kernel is therefore the normal subgroup 




Z2 = {1,-1}. 
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Chapter 16 

Highest weight representations 



This chapter discusses highest weight representations, providing tools for classifying many 
irreducible representations of interest. We extend the ladder technique used in Section 14.3 
for determining the unitary representations of the oscillator algebra to some other small 
Lie algebras of interest, and indicate how the ideas generalize further. 

The basic ingredient is a triangular decomposition, which exists for all finite-dimensional 
semisimple Lie algebras, but also in other cases of interest such as the oscillator algebra, 
the Heisenberg algebra with the harmonic oscillator Hamiltonian adjoined. 

We look at detail at 4-dimensional Lie algebras with a nontrivial triangular decomposition 
(among them the oscillator algebra and so(3)), which behave almost like the oscillator 
algebra. As a result, the analysis leading to Fock spaces generalizes without problems, and 
we are able to classify all irreducible unitary representations of the rotation group. Various 
related material concerning SO (3) and its universal covering group SU{2) is also included. 



16.1 Triangular decompositions 

Let L be a Lie *-algebra. A triangulEir decomposition of L consists of Lie subalgebras 
L_ , Lo and L+ of L satisfying the properties^ 

(Tl) L - L_©Lo©L+, 

(T2) LonL± C L±, 

(T3) L* = Lo, =L^, 

(T4) Lq is abelian and contains the center Z(L). 

"'^Note that the present concept of a triangular decomposition is less demanding and hence more general 
than in the treatment by MoODY &: PiANZOLA [173]. Their additional restrictions allow them to extend 
much of the finite-dimensional semisimple theory outlined below to the infinite-dimensional case. 
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Triangular decompositions generalize the properties of annihilation and creation operators 
in the oscillator algebra os(l) to more general Lie algebras. The terminology derives from 
the following motivating examples. 

16.1.1 Examples, (i) In the Lie algebra L = gl{n, C) = C*^**, we can define a triangular 
decomposition by defining Lq to be the Lie subalgebra of diagonal matrices, L+ to be the 
Lie subalgebra of strictly upper triangular matrices, and L_ to be the Lie subalgebra of 
strictly lower triangular matrices. Verification of the axioms is straightforward. 

(ii) The oscillator algebra os(l) has a triangular decomposition, given by 

L_ = Ca, Lo = Cl + Cn, L+ = Ca*. 

A triangulated Lie algebra is a Lie *-algebra with a distinguished triangular decomposi- 
tion. We call the number rkL := dimLo/2'(L) the rank, and degL := dimL± the degree 
of the triangulated Lie algebra L. The elements of the dual space^ Lq are called weights. 
A highest weight representation is a representation J of L on a vector space V with a 
distinguished element 1, called the ground state^, such that 

(HWl) J{a)l = for all a e L_, and 

(HW2) J{a)l e C for all a e Lq. 

The elements of L_ thus behave like annihilation operators. The defining properties imply 
that 

w{a) := J(a)l , for a e Lq , 

defines a weight w G Lq, called the highest weight of the representation. A highest weight 
representation is irreducible if and only if the elements a^. . .a^l with ai, . . . , G L_ span 
a dense subspace of V. In an irreducible highest weight representation with highest weight 
w, all Casimir elements C of L have a fixed value C{w) G C. 

The spectrum of L is the set S(L) of weights w for which a unitary group representation 
exists, whose associated infinitesimal representation is a highest weight representation of L 
with highest weight w. The spectrum of L determines the possible spectra of each Casimir 
element C in arbitrary unitary representations of the universal covering group of L, since 
the possible eigenvalues are precisely the possible C{w) where w ranges over the spectrum 
of L. 

Note that a weight w belongs to the spectrum of L iff there is a unitary (cf. Definition 
10.2.1) highest weight representation of L with highest weight w. In this case, there is a 
Euclidean inner product on V, and without loss of generality, the ground state 1 may be 
assumed to be normalized. 

^Since in the context of *-Lie algebras, the notation V* for the dual of V is ambiguous, we use in this 
section a prime to indicate the dual. 

^In a quantum field theory context, the ground state is referred to as the vacuum. 
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The semisimple Ccise. There are many examples of triangulated Lie algebras, related to 
finite-dimensional semisimple Lie algebras (see the outline below) and to important classes 
of infinite-dimensional Lie algebras. 

We mention without proof (which can be found in many places, e.g., FuCHS & Schweigert 
[84], Fulton & Harris [85], Humphreys [117], Jacobsen [121], Knapp [137], Kirillov 
[134]) a number of facts about finite- dimensional semisimple Lie algebras. 

All finite-dimensional semisimple real Lie algebras have a triangular decomposition, which 
is unique up to automorphisms. In this case, Lq is a Cartan subalgebra (a maximal abelian 
subalgebra generated by diagonal matrices in the adjoint representation, for some choice of 
basis), which is unique up to conjugation, and the Lie algebra L decomposes as 

L = Loe0L„, 

where A C Lq is the set of roots. The roots are nonzero elements of the dual of the Cartan 
subalgebra such that the Cartan subalgebra acts diagonally on Lq,: 

h ~i X = a{h)x , h Eh, x' G L^ . 

Lq is always 1-dimensional; any nonzero element in L„ is called a root generator. For 
each root a G A the negative —a is also a root: for all a G L*, if Lq, 7^ 0, then L_q, 7^ 0. 
Therefore there exists a choice of ordering such that A can be written as the union of the 
set of positive roots A+ and the set of negative roots A~ and A~ = — A+, in such a way 
that the cone of nonnegative linear combinations from A+ and A~ intersect in only. One 
defines 

L± = Lq , 

and finds (using further properties of the roots) that the semisimple Lie algebra is a trian- 
gulated Lie algebra. 

16.1.2 Example. Take L = s/(n, C), the Lie algebra of n x n matrices with trace zero. 
Let us we write Eij for the matrix that is 1 on the (i,j)-entry and zero everywhere else. 
Then the diagonal matrices that have trace zero make up the Cartan subalgebra, which is 
thus spanned by the matrices — E'j+i j+i for 1 < i < n — 1 so that the rank is n — 1. We 
have for h — diag(/ii, . . . , /i„) G Lq in the Cartan subalgebra and for E^ with i ^ j 

h -1 Eij = {hi - hj)Eij . 

Hence the roots are of the form — Xj where Aj reads off the ith diagonal entry of an 
element of the Cartan subalgebra. We can choose a root Aj — Xj to be positive if i < j. 
Then L+ are the upper triangular matrices, and L_ the lower triangular matrices. The 
positive root generators are Eij with i < j. 

Associated with each semisimple Lie algebra is a weight lattice, which is a discrete additive 
subgroup of Lq and whose elements are called integral weights. Additionally, there is a 
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distinguished subset of the weight lattice, which is closed under addition and whose elements 
are called dominant integral weights. In terms of these: 

(i) For each weight w, there is a Lie representation with w as highest weight. 

(ii) A highest weight representation is finite-dimensional if and only if the highest weight 
is dominant and integral. 

(iii) For compact finite-dimensional Lie algebras, that is finitc-dimcnsional Lie algebras with 
a negative definite Cartan-Killing form (these are automatically semisimple, see Lemma 
10.6.1), a highest weight representation is unitary if and only if it is finite-dimensional. The 
inner product is then uniquely determined by the requirement that the ground state 1 is 
normalized. 

(iv) The Lie algebra induces a unitary representation of the universal covering group G if 
and only if w is a dominant integral weight. Thus the spectrum of L consists of all dominant 
integral weights of L. 

In the context of an integrable classical theory associated with L, (iv) is equivalent to the 
Bohr— Sommerfeld quantization condition. (This folklore result is never stated in a 
precise form, but see, e.g., VOROS [253], Kochetov [138], and Gadiyar [86].) 



16.2 Triangulated Lie algebras of rank and degree one 

We have seen that the oscillator algebra os(l) has a triangular decomposition of rank and 
degree 1. A general triangulated *-Lie algebra of rank and degree 1 with center C must be 
the direct sum of the algebras 

L_ = Co , L+ = Ca* , Lo = C + C/i , 

where /i is a fixed element in Lq \ C. The center C commutes with everything, but h in 
general does not, which is the case we consider here. Then we may rescale h to obtain 

a ~\ h — ia . 

The operation * then gives 

a* ~i h = —ia* . 

For the Lie product of a and a* we introduce complex numbers u and v and write 

a~\ a* — i{uh + v) , 

but noting that (a ~i a*)* = —a "i a* we see that m, v e M. It is easy to check that for all 
u,v eM. the Jacobi identities are fulfilled and hence for all real numbers u, v we have a Lie 
*-algebra. 

For the two-parameter family of Lie *-algebras just defined there are essentially four different 
cases; 
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1. u — V — 0. This is the *-Lie algebra iso{2) © C. 

2. u = 0, V = ±1. If M = we can rescale the a and a* as a Xa and a* i— X*a* to get 
V — ±1. By complex conjugation of the algebra we then can choose the sign of v and 
we find the *-Lie algebra os(l). For the oscillator algebra we have h ~ a* a. 

3. u = 1 and v = 0. This *-Lie algebra is so(2, 1) © C. If w and v are both nonzero, we 
can redefine h as h ^ ah + /3 ior some a, /3 e C to obtain this case or the next one. 

4. u — —1 and v = 0. This is the *-Lie algebra so(3) © C. 

Note that the elements a and a* are abstract vectors from the point of view of Lie algebras. 
That means that we cannot say that a* is the conjugate of a; it is only in Lie *-algebras, 
in the *-Poisson algebras and in their unitary representations that we can say that a* is 
the Hermitian conjugate of a. It is for these reasons that we have treated case 3 and case 
4 separately. In a unitary representation we have J{f g) — ^{J{f)J{g) — J{g)J{f)), so 
that J if)* — J{f*) makes sense. 

As alluded before so(2, 1) and so(3) are isomorphic as complex Lie algebras. If we define 
in so{2, 1) the elements r — ia, s — ia* we obtain the relations 

h~i r = —ir , h s = is , r~is = —a , 

which defines case 4 of the list above: so(3). However, the map from so(2, 1) to so{3) does 
not preserve the *-operation, since r* = (ia)* — —ia* ^ s. That means that so{2, 1) and 
so(3) are not isomorphic as Lie *-algebras. 

Among the triangulated Lie algebras of rank and degree 1 hsted above, the most interesting 
cases for both classical and quantum mechanics are os(l) and so(3) © C. As we have seen, 
the oscillator algebra os(l) is related to the harmonic oscillator. The algebra so(3) © C 
involves infinitesimal ordinary rotations and arises when dealing with the spinning top, as 
explained in Chapter 14. The algebra so{2, 1) ® C is less prominent in classical mechanics 
although it arises in the analysis of the celestial 2-body problem. The algebra so(2, 1) © C 
has important applications to exactly solvable problems in quantum mechanics, and even 
appears in so-called gauged supergravity theories. 



16.3 Unitary representations of SU{2) and 50(3) 

We now discuss the unitary representations of the Lie groups SU{2) and 5*0(3). The 
method presented below is often encountered in quantum physics textbooks. In Section 

16.4 we discuss the highest weight representations of triangulated Lie algebras of rank and 
degree 1, which shows a great similarity with the discussion here. 

Since the group SU{2) is compact it has an invariant Haar measure diJ,{g). Therefore 
we can integrate over the group in an invariant way; invariance of the Haar measure means 
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I f{hg)dfJ,{g) — J f{g)d^{g) . If SU{2) acts on a vector space V with an inner product (, )o 
we can integrate over the group to get an invariant inner product; 

{v,w) = / {g-v,g-w)odix{g), 

JSU{2) 

where wc denoted the action oi g ^ SU{2) on v & V hy g ■ v. It is a direct consequence of 
the invariance of the Haar measure that the inner product (, ) is S'f/(2)-invariant. Hence 
we have reahzed SU (2) by unitary matrices; every representation of SU (2) is equivalent to 
a unitary representation. 

Since the group SU (2) is compact and simply connected there is a one-to-one correspon- 
dence between the representations of the group and the representations of the Lie algebra 
su{2) = so (3). The Lie algebra consists of antihermitian matrices but multiplying them 
by i we obtain Hermitian matrices and we may use the Pauli matrices to describe su{2). 
Finding all representations of the Lie algebra su{2) therefore gives all representations of 
SU{2). 

We put ti — \ai for i = 1, 2, 3 and define L± — [ti ± it2) and obtain a triangulated algebra 
with trivial center; 

h^L^ = ±L± , L+ n L_ = 2^3 . 

In a unitary representation we require that is Hermitian and L*^ = L^. If v is an 
eigenvector of with eigenvalue a, then L_v is an eigenvector of with eigenvalue a — 1. 
For a finite-dimensional representation we cannot lower the eigenvalue forever and hence 
there exists a vector v with L_v — 0. 

Assume that we have t^v = av for some complex number a. Acting on v with L_|_ we get 

vectors with eigenvalues a,a+l,a + 2, Again this series has to terminate. Thus, there 

is an eigenvector w with eigenvalue a + N that is annihilated by L+. Since is Hermitian, 
vectors with different eigenvalues are orthogonal and hence linearly independent. Thus the 
N + 1 vectors with eigenvalues a, . . . ,a + N form an irreducible representation. The trace 
of ^3 is zero, since trts = — itr(tit2 — ^2^1)- But then the sum of the eigenvalues should 
vanish: 

TV 

= ^ a + n = (TV + l)a + -N{N + 1) = -{N + l){2a + N) . 

n=0 

It follows that a — —N/2. Therefore the eigenvalues are the integers — y + l,---,y — 
1, Conversely, for all integers N we find a representation by giving vectors Ca with 

—N/2 < a < N/2 and defining the action of ts and L± by the above rules. We then recover 
ti and ^2 by ti = |(L+ + L_) and t2 = ^(L_|_ — L_). We can thus label the finite-dimensional 
representations of su{2) by half-integers j = 0, 1/2, 1, 3/2, 2, . . .. We denote them by Dj-, 
note that we have j — N/2. The dimension of the representation Dj is 2j + 1 and the 
eigenvalues of ^3 are —j, + — The Casimir defined by 

— titi + ^2^2 + ^3^3 = L^L_ — ^3 -|- ^3^3 

has the value j{j + 1) on the representation Dj since acting on the state v with eigenvalue 
-j 

J\ = {L+L_ -t; + ht3)v = iO + j+ f)v = ]{] + 1)^; . 
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The number j is called the spin of the representation. Clearly Dj is irreducible. 

The representations that correspond to nonintegral j cannot be hfted to representations of 
S0{3). Although the Lie algebras su{2) and so{3) are isomorphic, the groups SO{S) and 
SU{2) are not! As mentioned before, SU{2) is the universal covering group of S0{3). In 
fact, we have 5*0(3) = SU{2)/7j2. That means that there is an action of Z2 on SU{2), 
such that 5*0(3) is the manifold SU{2) with the points that arc related by the Z2-action 
identified. In Section 1.5 we gave details on how 5*0(3) and SU{2) are related by a 2- 
1 map SU{2) — > 5*0(3). If a representation of SU{2) is such that Z2-related points 
have the same image under the representation we have a well-defined representation for 
5*0(3); this thus precisely corresponds to the Z2-invariant representations. It turns out 
that only the representations with integer / correspond to Z2-invariant representations. In 
physics, particles are represented by fields that take values in an sM(2)-representation. The 
representations Dj for j = 1/2, 3/2, 5/2, .. . correspond to fermions and for j = 0, 1, 2, . . . 
to bosons. 



16.4 Some unitary highest weight representations 

For the quantum theory one considers the unitary highest weight representations. We 
investigate the unitary highest weight representations for the triangulated Lie algebras of 
rank and degree 1 hsted in Section 16.2. We thus look for a reahzation of operators a, a*, h 
and 1 such that 1 acts as the identity, h acts diagonally and is Hermitian, a* is the adjoint 
of a and the following relations hold (see (8.21) and Definition 10.2.1): 

[a, /i] = /la , [a*, h] = —ha* , [a, a*] = h{uh + v) . 

Furthermore, we assume there is a vector |0) with 

a|0) = 0, h\Q) = a\0). 

By acting with a* on |0) we obtain the other vectors in the representation. We define 

so that 

a*\k - 1) ^ hk\k) . 

It follows that 

h\k) ^ h{k + a)\k) . 

We have a\k) = Ck\k — 1) and we want to determine c^. Since aa* — [a, a*] + a*a we find 
hkck\k — 1) = aa*\k — 1) = h^{uh + v) + {k — l)ck-i^ |A; — 1) , 



from which it follows 

kck — {k — l)cfc_i = h{uk -\-ua-\-v) , 
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which is solved by 

kck = + (Ici - Oco) + (2c2 - Ici) + . . . + {kck -{k- l)cfc_i) 
= hk (^hau + V + ^hu{k + 1) j , 

so that 

a\k) = (ahu + V + lhu{k + ifj |A; - 1) . 

The vectors \k) are orthogonal, as in the case of the harmonic oscillator. So we suppose 
{j\k) — NkSjk, and calculate (j|a*|A;) in two ways: 

{j\a*\k) = {k + l){j\k + l), 

{j\a*\k) = {v + ahu + lhu{k + l)){j - l\k). 
Choosing k — j — 1 we find 

jhNj = (v + uha + + l)uh) Nj_i . 

For a representation we require that Nj > for all j. We may normalize A^o = 1 and it 
follows that we must have a e M. To have a faithful representation we need |1) 7^ and 
thus A^i > 0. We distinguish further two cases: 

Case 1: u > 0. By assumption ipo is a nonzero vector and thus has a positive norm. But 
then all Nj are positive. Hence we find nonzero vectors for all j G Nq. An example of this 
case is given by so{2, 1), which is a noncompact Lie algebra. More generally, noncompact 
Lie algebras (defined by having a Cartan-Killing form that is not negative definite) do not 
admit a finite-dimensional unitary representation. 

Case 2: u < 0. In this case Nj can become negative, unless it becomes zero for some integer 
jm- Thus {jm + 1) + 2(a + In this case we thus have a finite-dimensional unitary 
representation for every integer j„i = 0, 1,2, . . .. The dimension of the representation is 
jm + 1- If jm — the vector |1) is already zero, and hence a* operates as in this 
representation. Therefore, if jm > 1 the representations are faithful. 

For the triangulated Lie algebras of rank and degree 1, there is a Casimir operator of the 
form C = ha* a — q{h) for some quadratic q{h). From Section 16.3 we know that so(3) has 
the Casimir = 2aa* + h^ + ih. And for the algebra os(l) the element C = ha*a — h is 
a Casimir. For the harmonic oscillator we then have C = 0, since h is precisely ha*a. For 
so{2, 1) this does not work; there is no analogue of the number operator with only integer 
eigenvalues. That is, the Lie algebra so(2, 1) does not admit a discrete Casimir. 



Chapter 17 

Spectroscopy and spectra 



This final chapter apphes the Lie theoretic structure to the analysis of quantum spectra. 

After a short history of some aspects of spectroscopy, we look at the spectrum of bound 
systems of particles. We show how to obtain from a measured spectrum the spectrum of 
the associated Hamiltonian, and discuss qualitative results on vibrations (giving discrete 
spectra) and chemical reactions (giving continuous spectra) that come from the consider- 
ation of simple systems and the consideration of approximate symmetries. The latter are 
shown to result in a clustering of spectral values. 

The structure of the clusters is determined by how the irreducible representations of a 
dynamical Lie algebra split when the algebra is reduced to a subalgebra of generating 
symmetries. The clustering can also occur in a hierarchical fashion with fine splitting and 
hyperfine splitting, corresponding to a chain of subgroups. As an example, we discuss the 
spectrum of the hydrogen atom. 



17.1 Introduction and historical background 

In this chapter we show some features of spectra and spectroscopy. In the preceding chapters 
we discussed properties of systems. The Hamiltonian of a system has a spectrum consisting 
of the eigenvalues, but in practice we don't see this spectrum, but the energy differences. 
One perturbs the system by shining light on it for example and then observes some response. 
The responses give rise to the observed spectrum, the study of which is spectroscopy. 

To study the structure of molecules and atoms, we often rely on destructive methods. 
The destructive nature of the experiments in chemistry was taken as a primitive distinction 
between chemistry and physics. Nowadays the situation is different. In high-energy physics, 
one also shoots particles at each other such that the original particles are destroyed and 
energy is converted into the creation of other particles. On the other side, in chemistry 
new laser-techniques are used where molecules are kept intact, and information about the 
structure of the molecular bonds is obtained. 
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With spectroscopy one can study properties of materials and mixtures without destructing 
the sample. There are crudely speaking two kinds of spectra, relying on different experi- 
mental methods. An emission spectrum is obtained by putting a system in a state of high 
energy. The system then falls back to a state with lower energy, and the energy difference is 
emitted in the form of light. Of course, in order to emit light, the system needs to interact 
with light. The kind of interaction then dictates which transitions are possible and hence 
which frequencies are emitted. For the absorption spectrum one more or less does the con- 
verse. One puts a system into a beam of (nearly) white light. The system then absorbs 
light and re-emits it again, but then in all directions. 

In the 19th century Kirchhoff used an invention of the German chemist Robert Bunsen to 
heat up elements in a flame to study the emitted light. He passed light through a prism 
to study the intensity of light at different wavelengths. It turned out that the emitted 
spectrum of an element had quite clearly defined lines at certain wavelengths. In 1859 
Kirchhoff pointed out that all the elements that he had been studying had a different 
emission spectrum. Hence disentangling the lines of an emission spectrum can help in 
finding the components an unknown mixture is made of. Figure 17.1 gives as example an 
emission and an absorption spectrum of Helium. 



Figure 17.1: The emission (upper) and absorption (lower) spectrum of Helium. 

Already much earlier, Isaac Newton had used in 1670-1672 a prism to study the decom- 
position of white light into a spectrum of different colors. In 1814 Joseph von Fraunhofer 
invented the spectroscope and identified 574 dark lines in the light of the sun. In fact, 
the Fraunhofer experiment can already be done with primitive equipment. On a sunny, 
cloudless day one sits in a dark room with one little hole through which the sun shines. In 
the beam of sunlight one places a prism and lets the light after the prism fall onto a white 
piece of paper. The observed spectrum can be seen to display dark lines; in Figure 17.1 
the lines corresponding to helium are displayed. The Fraunhofer lines are a manifestation 
of the absorption spectrum. It was Kirchhoff who later explained the origin; light from 
the sun has to pass the atmosphere of the sun. In the atmosphere the elements that are 
present absorb certain parts of sunlight, at well-defined frequencies and re-emit it later, but 
then in all directions. Therefore the sunlight going in the forward direction - that is, away 
from the core of the sun - has lost intensity at certain well-defined frequencies. In this 
way, Kirchhoff showed that the atmosphere of the sun contained among others hydrogen 
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and sodium. The reason why the sunhght is almost white before entering the atmosphere 
of the sun we will not explain. When the light of the sun reaches the earth it is already so 
diluted that the elements in the earths' atmosphere give almost unobservable absorption 
lines. Therefore, the dark hnes in the spectrum of the sun are due to the suns' atmosphere 
and not the earths' atmosphere. 

In 1868, the French astronomer Pierre- Jules- Cesar Janssen observed a line in the spectrum 
of the sun that did not match any element known by then. The reason he observed it 
and not Praunhofer was because Janssen used the better observing circumstances that a 
solar eclipse offers. Normally, the sun is too bright, but when the moon blocks the solar 
disc, one sees solely the atmosphere of the sun. The astronomer Joseph Norman Lockyear 
concluded that the new line must represent a new element. They tossed the name helium, 
from the Greek word "helios" , which means sun. It was not until 1895 that the physicist 
John WiUiam Strutt, Lord Rayleigh - or in short John Rayleigh - proved that helium is 
also present on earth; he found it in samples of the mineral clevite. He exposed the mineral 
to some acids that reacted with the material thereby producing gasses. Then he studied 
the contents of the gas mixtures, and he found that helium was present. The reason why he 
found helium was explained later. Clevite is a mineral that contains uranium. The element 
uranium is radio-active; it can emit a-particles, which are the nuclei of helium atoms. 



17.2 Spectra of systems of particles 

We distinguish two kinds of spectra: 

1. The spectrum in the sense of spectroscopy is the collection of frequencies emitted or 
absorbed by the system in its interaction with light or other electromagnetic (infrared, 
radio. X-ray) radiation. 

2. The spectrum of a physical system is the collection of allowed energy values - the set of 
eigenvalues of the associated Hamiltonian. 

The relation between the two is as follows. The observed spectrum (of spectroscopy) consists 
of the energy differences of the system: the observed spectra are of the form hou^n — 
Em — En, where the energy levels of the system are En- In most systems the spectrum is 
discrete. Hence also the observed spectrum is discrete. 

For systems that are made of constituents that can break apart, the spectrum contains 
continuous parts. Consider for example a molecule of two atoms like H2. At a certain 
frequency the molecule can break apart. Then the energy of the photon can also be put 
into the kinetic energy of both i7-atoms, which is a continuous parameter. 

If the Hamiltonian has some imaginary eigenvalues A, then ImA < and the modes corre- 
sponding to A are decaying modes. In a dissipative environment this results in energy loss, 
and the system can move from higher energy to lower energy. 

On the other hand, a system can also be excited. It then absorbs energy from the environ- 
ment. A typical example of excitation is an atom interacting with light. The energy levels 
of the atom are discrete, and hence only with a fine-tuned frequency the atom can absorb 
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a photon and attain a state with more energy. The energy difference between the ground 
state and the state with the second lowest energy is called the energy gap. If a photon has 
the frequency with the energy corresponding to the energy gap, it can be absorbed by the 
atom and the atom can be excited to the state above the ground state. 

An excited atom cannot move down to a state with lower energy due to energy conservation, 
unless there is no interaction with light. Incorporating interaction with light into the 
Hamiltonian makes the energies acquire a small imaginary part, representing the possibility 
to decay. If an atom jumps down in energy, it emits a photon with the same energy. This 
process is called spontaneous emission. The nice feature of spontaneous emission is that we 
can observe it. 

The interaction with light is not just any arbitrary interaction. The interaction term Vint 
in the Hamiltonian 

needs to respect some symmetries hke Gahlean invariance. The result is that not all 
transitions but only a selected set of transitions is allowed. The rules that dictate which 
transitions are allowed are therefore called selection rules. 

The interaction is often treated as a perturbation. The justification is that the interaction 
term in the Hamiltonian is small compared to the other terms. One introduces a dimen- 
sionless variable A and re-writes Vint as Vint{X) — XVint- One recalculates the spectrum and 
expands it in A to find 

Ek{X) = E,{0) + XAEl + X'AEl + .... 

Since the interaction is small, the first order correction often gives the interaction with light 
accurately enough. Using the techniques of perturbation theory one then finds the possible 
transitions, i.e. the selection rules, and the probabilities of the transitions. The probabilities 
gives the dominance in the observed spectrum; if a transition A is more probable than a 
transition B this will result in more spontaneous emission along transition A. Therefore the 
peak in the spectrum corresponding to A is bigger than the peak corresponding to B. 

Observed spectra are often displayed by plotting, as in Figure 17.2, on the horizontal 

axis the frequency and on the vertical axis the observed intensity. Due to imperfections in 
measuring methods one never observes a real peak, but always a smeared out peak, that 
is, peaks have a width. However, there can be many reasons why a peak has a certain 
width. Imagine for example that one measures the spontaneous emission of a gas contained 
in cylinder. The gas atoms are moving around in the cylinder, with different velocities with 
respect to the measuring device. For each atom the spectrum is shifted due to the Doppler 
effect, known from a similar effect with sound, which can be observed when an ambulance 
passes by. Since one measures the emission of a whole population of atoms, the measured 
peak is a superposition of peaks that are distributed around a certain frequency. That is, 
the Doppler effect broadens a peak. 

Technical imperfections of the measuring device also broaden peaks. Making the measuring 
equipment more and more accurate one can try to get a better and better resolved spectrum. 
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Figure 17.2: An example of a spectrum. 



Doing this one might see that broad peaks resolve into a group of smaller peaks. One sees 
therefore more structure. 

The result of a measurement is a list of data, the frequencies cui. Using the data one wants 
to obtain information of the system under study. If one knows the system already quite 
well, for example if one knows the parametric form of the Hamiltonian but not the precise 
values of the parameters, one may fit the measured energies to obtain a set of parameters 
that describes the measurements best. One therefore has to solve a data analysis problem. 
For each label I one has to find energies and with 



within the experimental accuracy. Therefore, one solves the least-squares problem of mini- 
mizing the sum 



for some weight factors qi related to the inverse of the accuracy of the measurement of oji. 

In general, both the fist E of energy levels Ei and the functions j, k which determine the 
assignment of spectroscopic fines to transitions are unknown, and must be determined by 
minimizing S{E,j,k). Usually, one starts with a preliminary list E of energy levels, and 
assigns each line / to a transition which minimizes the /th term, breaking ties arbitrarily. 
This defines preliminary assignment functions j, k. Fixing these turns the problem of min- 
imizing S{E,j, k) into a least squares problem for finding the energy levels, resulting in an 
improved E. Clearly, each cycle decreases the value of S{E,j,k). The process is stopped 
when the assignments no longer change. Then S{E,j,k) has reached a local minimum. 
Multiple lists of trial energy levels may be used to increase the likelihood that the assign- 
ment found corresponds to a global minimum. Frequently, one first assigns a subset of lines 
to a subset of levels to find good starting values. 
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Figure 17.3: Harmonic oscillator potential, with the eigenvalues indicated. 




Figure 17.4: On the left a double well potential with the first energy levels indicated. On the 
right the Morse potential; the bound states have discrete energy but above the dissociation 
energy the spectrum is continuous. 

17.3 Examples of spectra 

The geometry of the molecule or atom under consideration strongly influences the spectrum, 
since the geometry determines the potential. 

Consider a molecule of two atoms. We assume that the excitations inside each atom are of 
another magnitude than the excitations of the bond between the atoms. In that case we 
may consider the molecule as two balls connected by a spring. The spectrum is as in Figure 
17.3, and the observed spectrum consists of one peak. 

Consider now a system that has two local minima. An example of this would be a molecule 
C2H4 of which two versions exist, the cis and trans molecules. The molecular bond between 
the two C-atoms then behaves around each local minimum as a harmonic oscillator in some 
approximation. For higher energies however the two states start to interact and the molecule 
can change from cis to trans and vice versa. A typical spectrum then looks like Figure 17.4. 
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Figure 17.5: Sketch of the potential a proton experiences in the force field of a nucleus. 

When there are asymptotically free states, one says that the system admits dissociation. 
Free states have continuous kinetic energy and hence the spectrum contains continuous 
parts. A potential showing dissociation is the Morse potential given by 



where r is the atomic distance and a, (3, and 7 are positive parameters, see Figure 17.4. The 
potential of the H2 molecule discussed above is another example. Above the dissociation 
energy the spectrum is continuous; the bound states have discrete energy. 

Quantum physics has a remarkable feature compared to classical mechanics, called tun- 
neling. If a particle is in a local minimum at energy Ei and another minimum is available 
with energy level £"0 < Ei, then there is a nonzero probabihty that the particle 'travels 
through the barrier' and ends up in the local minimum with lower energy. For example, 
the potential of the C2i?4-molecule discussed above admits tunneling since the potential 
has two local minima. The probability of tunneling decreases with the height of the bar- 
rier between the two energy levels. Another example where tunneling occurs is in nuclear 
physics; the potential of Figure 17.5 represents the energy a proton feels in the potential 
field of a nucleus. The diameter of the nucleus is roughly the distance between the two 
peaks in Figm^c 17.5. The difference to the C2/f,i-molecule is here that the tunneling takes 
place between two states one of which is not intcgrable. Tunneling can go in two different 
directions; one direction is where the proton is shot at the nucleus with too little energy 
to classically penetrate the nucleus, the other direction is where the proton is inside the 
nucleus and classically cannot get out. In the latter case, there is a certain probability 
that the proton escapes the nucleus. This explains qualitatively the stochastic behavior of 
radio-active decay. 



As another example, consider a chemical reaction of the form AB -\- C ^ A-\- BC, that is, 
the molecule AB splits off a part B that then attaches to C to form BC. Here there are 



V{r) = aie-^' - -ff - a-f\ r > 
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Figure 17.6: 2D-Plot of the potential experienced in a chemical reaction AB-\-C —>■ A-\-BC. 
The potential depends on the distances \AB\ and \BC\. The upper part shows a cross- 
section of the potential. The red marker is the saddle point. 

two important parameters. The distance \AB\ between A and B and the distance \BC\ 
between B and C. A possible potential is plotted in Figure 17.6. The plot shows two valleys 
separated by a saddle point, marked by a red cross. The horizontal valley corresponds to 
\BC\ constant, hence to the state A + BC. The other valley corresponds to AB + C, and 
at the saddle point part B is exchanged. 

17.4 Dynamical symmetries 

As discussed before, when one looks at a poorly resolved spectrum, one sees some rough 
features of the system under study. Improving the resolution allows one to study more 
structure of the system. 

A similar process happens when one studies a hydrogen atom in an external magnetic field. 
Upon increassing the magnetic field one sees that many lines of the original spectrum split 
into several close lines. Thus what first seems to be one state in fact turns out to be an 
agglomeration of different states. The states first had energies that were so close together 
that they could not be recognized as belonging to different states - indeed, they have exactly 
the same energy. As we shall see, that these states (seemingly) agglomerate to one single 
state is due to symmetry reasons. 

The rotational symmetry implies that the energies of different states related by a rotation 
have the same energy; more pictorially, whether an electron circles around the proton with 
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the rotation axis in the 2;-direction or the y-direction gives the same energy. Turning on 
the magnetic field results in breaking the symmetry; then the different states that first 
agglomerated to form a single state are disentangled and can be observed separately in the 
spectrum. 

But as with the increasing resolution, taking a closer look at the hydrogen atom reveals 
more and more structure. In a first approximation, the electron in the hydrogen atom can 
be treated nonrelativistically. Treating the electron relativistically, one gets a correction 
to the spectrum. The first order corrections of special relativity go under the name of the 
first radiative corrections. 

We shall look in some detail at the hydrogen atom once we have clarified the general 
principles. 

Symmetry and broken symmetry. The most symmetric physical systems, in particular 

the standard 2-body problems (the classical Kepler problem and the quantum hydrogen 
atom) are exactly solvable. The Helium atom is already a three-body problem and is not 
exactly solvable. 

A physical system is called exactly solvable (or integrable, or completely integrable) 

if it has "enough" constants of motion. Equivalently, if the centralizer C^{H) of the Hamil- 
tonian H in the algebra E of observables is "large enough" . The effect of having enough 
central elements is that the system has enough conserved quantities to explicitly solve the 
differential equations of the system. 

A dynamical algebra of a classical physical system is a Lie algebra L that one can associate 
to the system such that the Hamiltonian H is contained in the Lie-Poisson algebra C°°(L*). 
For a quantum mechanical system the corresponding requirement is that H is contained in 
the closure of the universal enveloping algebra U (L) of L, equipped with a locally convex 

2 

topology such that potentials of the form are allowed. 

For example, the Heisenberg algebra h{n) is the dynamical algebra of symplectic classical 
systems with n position degrees of freedom, and of traditional Schrodinger quantum me- 
chanics. The hydrogen atom has additional rotational symmetry, and the special properties 
of the Coulomb potential imply that one can in fact find a fairly big dynamical algebra, 
namely so(2,4), see e.g. Wybourne [265]. 

Now consider any Lie algebra L as a dynamical algebra. Call E the Lie-Poisson algebra 
associated to L for a classical case or the universal enveloping algebra of L in the quantum 
case. The symmetry algebra is the centralizer of the Hamiltonian in E, written C^[H). 
In the 'nicest' case one has E = C-m,{H), which means that H is a Casimir of L. Normally, 
the Lie algebra L describes the symmetries of the (unperturbed) system and thus one would 
expect that the nicest case is the general case. 

However, a very symmetric system is rarely studied in isolation, and realistic systems are 
at best perturbations of nice systems. In this case one gets broken symmetries, meaning 
that the Hamiltonian is only almost a Casimir. Note that it might happen that the classical 
theory has a symmetry, but that in the quantum version of the theory the symmetry gets 
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broken. In case of a broken symmetry, one usually first tries to solve the symmetric problem 
and then perturb the solutions to get approximate solutions to the problem with broken 
symmetry. We will not go into details about the mathematics of perturbation theory, since 
this topic is amply treated in every book on quantum mechanics. But we will consider some 
of its qualitative implications. 

Suppose we have solved a symmetric problem. Then the solutions are described as elements 
of some Hilbert space HI on which L acts unitarily. We can decompose the Hilbert space 
into a direct sum of eigenspaces of the Hamiltonian; H = ®aHa- Let ip be some eigenstate 
in Mx of the Hamiltonian and let / e L, then we see 

Hf^ = [H, /IV' + fH^ = [H, f]^ + Xf^ . 

Since [/, if] = we see that L maps each eigenspace into itself. Thus all Mx are L-modules. 

We call the eigenvalue A nondegenerate if the dimension of Mx is 1. and degenerate if 
it is bigger than 1. (Dimension zero means that A is not an eigenvahic.) 

If A is degenerate, M.x has many essentially distinct bases of eigenvectors of H. One of these 
is usually distinguished by the concrete representation used to describe the module; H and 
usually a distinguished part of L act diagonally. In general the perturbed Hamiltonian no 
longer acts diagonally on Ha, and as a result the level A usually splits into several distinct 
levels. The energy level A splits into new levels that are of the form A + e.j for some different 
but small values ej, giving rise to a fine structure. In general, a fine structure implies that 
either a symmetry is broken (the system reached a nonsymmetric state) or an external force 
that broke the symmetry explicitly has been applied. 

The induced representation of L on Vx is unitary. Therefore knowing the irreducible unitary 
representations of L can give information about the system under study. 

17.5 The hydrogen atom 

A hydrogen atom is a bound state of a proton (the nucleus) and an electron. It is most esaily 
described by treating the much heavier nucleus as fixed (which amounts to neglecting recoil 
effects) and considering the electron as moving in the electrostatic Coulomb field generated 
by the nucleus. 

The electron is a spin 1/2 particle, a fermion, meaning that it is described by the spin 1/2 
representation of so(3) on the Hilbert space L^(]R^,Pi) = L^(]R'^) (8>Pi) defined in Section 
15.5. Below we first discuss the orbital part of the wave functions, i.e. the L^(M^)-part. 
Then we discuss the dynamical symmetries and how they get broken. 

The orbital quantum states are labeled by integers n, I and m. The integer n takes the 
values 1, 2, 3, . . . and the number / takes for each fixed value of n the values 0, 1, 2, . . . , n — 1. 
Finally, the number m takes for each I the values —1,-1+1, ... ,1 — 1,1. Hence the (orbital) 
state of an electron is described by a state 

\n,m,l) where n>l, 0<l<n, —l<m<l. (17.1) 



17.5. THE HYDROGEN ATOM 



329 



The quantum number n determines (to a first approximation) the energy of the state: 

13.6eV 

En = — . 17.2 

The abbreviation eV means electron Volt and is a unit for energy. The quantum number I 
specifics a representation of so(3). Thus we can make use of the representation theory of 
so(3) developed in Section 16.3. 

The electrostatic potential of the hydrogen atom is yS'0(3)-invariant, hence it is not too 
surprising that S'0(3)-representations plays a role; the orbital part of the electron wave 
function can be decomposed in representations of 5*0(3). The quantum number / cor- 
responds precisely to the irreducible representation Di of so(3), that is, precisely to the 
representations of so(3) that lift to 5'0(3)-representations. The quantum number m labels 
the (j3-eigenvectors of the representation and corresponds to the eigenvalue m. The quan- 
tum number n thus determines which 5'0(3)-representations are allowed, and the I and m 
then specify the representation and an eigenvector in this representation. 

Now we shortly describe the relation between the quantum numbers and the orbital wave 
function of the electron in the hydrogen atom. We can give the hydrogen atom a coordinate 
system as follows. We put the proton in the center and describe the position of the electron 
by a radial coordinate r measuring the distance between the proton and the electron and 
by two angles 9 e [0, tt] and G [0, 2tt). The solutions to the Schrodinger equation for the 
hydrogen atom are then given by 

The radial part of the wave function i?„ ; is completely determined by the quantum numbers 
n and / and is given by 

Here Cn,i is some constant such that Rn,i is normalized to integrate to one, the L^{p) are 
generalized Laguerre polynomials (one of the well-known families of special functions) ; p is 
the normalized radius p — —, and an is a constant called the Bohr radius. The angular 

" nao ' " ° 

part Yi^m of the wave function is given by 



im0 



where the Ki„i are normalization constants, and the are the associated Legendre 
polynomials given by 

Pr(-) = iJ^(l-a;r/^(^£) i^'-lY- 



Symmetries and symmetry breciking. Nonrelativistically the electron in an electro- 
magnetic field is treated with the Pauli equation. The Pauli equation looks like the 
Schrodinger equation, but has some extra terms, describing the coupling of a spin 1/2 par- 
ticle to the electromagnetic field. We now indicate why, in the case where the external elec- 
tromagnetic field is switched off, the symmetry group of the Hamiltonian is SO {4) x SO {3). 
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The second factor in the symmetry groTip, the 5*0(3), is the symmetry group that acts on 
the spin of the electron. That is, it acts on the D1/2 part of L^(M^) D1/2. 

The first factor in the symmetry group, the 5'0(4), acts on the space-part L^(R^) of the wave 
function. The Hamiltonian of the hydrogen atom is rotationally invariant. Infinitesimal 
rotations are generated by the angular momentum L = r x p, where r is the radius and 
p is the linear momentum and hence the angular momentum components describe the Lie 
algebra so{3). However, there exists an additional vector whose length is conserved: the 
length of the Lenz-Runge vector. (Some people call it the Laplace-Runge-Lenz vector, 
or even Laplace vector.) This leads to the bigger group -50(4); see e.g. Goldstein [94]. 

To treat the electron relativistically one uses the Dirac equation for a spin 1/2 particle 
coupled to an electromagnetic field. The coupling to the electromagnetic field can be done 
in a quite easy way. Starting with the Dirac equation 

(^7 ■ d + mc) ip = 

one simply replaces the derivatives with d^—iqA^ where the zeroth component of A gives the 
Coulomb potential and the spatial A components contain the magnetic field via B = V xA; 
the parameter q is interpreted as the charge. We obtain 

{hj ■ d — iqhj ■ A + mc) ip — , 

where 7 • A 

The effect of having the fully relativistic coupling terms is that there is a coupling between 
the spin of the electron and the orbital angular momentum of the electron. The additional 
couphng terms in the Hamiltonian are called spin-orbit coupling terms. Due to the 
coupling the separate 5'0(3) of the spin gets destroyed; without coupling there is a rotational 
symmetry group acting separately on the orbit and on the spin and due to the coupling, 
the two rotational symmetries are no longer independent. The angular momentum L and 
the spin S are no longer separately conserved in magnitude, but (L + S)^ is constant. The 
symmetry group of the relativistic hydrogen atom is therefore SO (A). The spectrum that 
is observed is called the fine structure spectrum. 

Going even further and treating the hydrogen atom with quantum field theory results in a 
further breakdown of the symmetry to the group 5'0(3). The group 5*0(4) is isomorphic^ 
to 5*0(3) X 5*0(3) and corrections from quantum field theory break it down to the diagonal 
subgroup 5*0(3). The observed spectrum is called the hyperfine structure spectrum. 



17.6 Chains of subalgebras 

In more realistic situations, the Hamiltonian is not invariant under the total dynamical 
algebra E, the universal enveloping algebra of L. In this case, the Hamiltonian is not in 

^This isomorphism is proved along the same hnes displayed in Section 1.5; first one proves so(4) = 
so(3) ® so(3) and then notes that so(4) is connected. 
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the center of E, but we can consider the centrahzer of H in L. The centraUzer of i7 in L is 
a subalgebra of L, and is therefore a Lie subalgebra of E and we denote it by Li. We thus 
have H e Ce(Li). The Lie subalgebra Li generates a subalgebra of E, which we denote by 
El. 

In simple applications, it often happens that the Hamiltonian H is a function H(Co, Ci) 
where Co is a Casimir of L (that is, it is a central element of E) and where Ci is a Casimir of 
Li (in the center of Ei). In more complicated applications we have a series of approximations 
to the problem, as explained for the hydrogen atom before, where relativistic and quantum 
field theory effects modify the Hamiltonian. In each step one modifies the Hamiltonian by 
adding terms with ferer and ferwer symmetries, and the symmetry algebra is reduced to 
correspondingly smaller subalgebras. We thus have a sequence of subalgebras 

L = Lo 2 Li D . . . D L„ = L . 

The final subalgebra L commutes with H. The generated subalgebra of E, denoted E cen- 
trahzes H in E. If the Hamiltonian is a function H — H{Cq, . . . , C„) where Ck is a Casimir 
of Lfc, the scheme gives explicitly solvable problems. For example, for the nonrelativistic 
hydrogen atom without spin, one finds a series 

so(4) D so(3) D so{2) D 1 . 

Of course, there are many Hamiltonians that cannot be represented as functions of a chain 
of Casimirs, but the above scheme covers many applications, and is a starting point for a 
perturbative treatment of many others. 

In classical symplectic mechanics one relates the Lie algebra L to so-called action variables 
and the steps to Lq are constructed using conjugate angle variables. We will not go into 
the details defining variables and the related techniques. 

Consider the situation where H — H{Cq,Ci), that is, the simple application. We write 
H — Ho + Hi where Hq is only a function of Co and Hi depends on Co and Ci. As before, 
we suppose we have realized the elements of E (and thus of L) as operators on some Hilbert 
space H. We assume that the subspaces Ha on which the Hamiltonian Hq = Hq{Cq) acts 
diagonally are finite-dimensional. This is for example the case for the hydrogen atom. We 
furthermore split up in irreducible representations of Lo so that we may assume that 
Ha is irreducible. Modifying Hq to Hq -\- Hi means that the symmetry algebra becomes 
smaller; it becomes Li. We can restrict the representation of Lq on Ha to the subalgebra 
Li to obtain a representation of Li. In most cases this representation is reducible and we 
write the decomposition of Ha into Li irreducibles as 

Ha = ©H«. 

More generally, suppose we have a sequence of subalgebras Lo D Li D . . . D L„ and related 
Casimirs Ci G Z{U{L,i)). It follows that the Ci commute among each other; Ci is in the 
center of f/(Li), which contains U{hk) for k >2 and hence Ci commutes with C2, C3, and 
so on. Thus on the irreducible representations of Li the Casimirs act diagonally. Hence 
we can assign to each representation of L„ appearing in the decomposition of the original 
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representation EI numerical values Oq, . . . ,6n, corresponding to the eigenvalues of the Cj. 
Given a physical state w in a representation of L„ corresponding to the label [Oq, . . . , On) 
we see that the Hamiltonian acts as 



where on the right-hand side the Hamiltonian is an operator and on the right hand side 
i7(6'o, . . . , On) is a number. 

Branching rules. In a lot of favorable cases, for example when the Lj are simple, the 
decomposition of an irreducible representation of the large algebra into irreducible repre- 
sentations of a maximal subalgebra is known. These decompositions go under the name of 
branching rules; sphtting up a representation under a subgroup or subalgebra is called 
branching. 

Let us give an example of a branching rule and look at the fundamental representation 
of .s?/(3), that is, the 3-dimensional representation of su{?>) that defines the Lie algebra 
sn(3). The Lie algebra elements are faithfully represented as 3 x 3-matrices X that arc 
antihcrmitian; + X = 0. Now we consider the Lie subalgebra su{2). There are different 
ways we can embed su{2) into sm(3), but it turns out that all of them are equivalent. We 
can always choose a basis ei, 62, 63 in such that su{2) only acts nontrivially on the 
subspace spanned by ei and 62. We thus realize su{2) inside su{2)) as the following matrices 
in su{2)) 



We see that the three-dimensional representation of sm(3) splits into two irreducible repre- 
sentations of sm(2), the trivial one, spanned by 63 and the two-dimensional (fundamental) 
representation spanned ei and 62. One writes this in shorthand as: 3 — > 2 -|- 1 under 
su{2) C su{?>). In the reference Slansky [229], one can find tables of branching rules. 

Clebsch-Gordan coefficients. In an important special case one can relate the branching 
rules to the so-called Clebsch-Gordan coefficients, which are widely used in physics. 

Let us explain the Clebsch-Gordan coefficients for su{2). Given two representation Di and 
Dk (see Section 16.3), we can form the tensor product Dk ® Di. An sM(2)-element x acts 
on i> (8) w by mapping v ®w to x{v) ®w-\-v® x{w), where we write x{v) for the action of 
X on V. In general, the representation Dk <S> Di is not irreducible, and we have 



The precise decomposition of a vector v®w, where v and w are eigenvectors of (73, into the 
irreducible components is given by the Clebsch-Gordan coefficients. For the vector in the 
Dk representation with (73 eigenvalue m and norm one we write |A;, m). If the representation 
Dk is inside the tensor product of Dk^ and Dk^-, we can decompose any vector M) as a 
sum of vectors of the form mi) ® |/c2, and the Clebsch-Gordan coefficients are then 



H{Co,...,Cn)v^H%,...,On)v, 




where 




Dk®Di = Dk+i © Dk+i-2 © ... © D\k-i\ . 
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the coefficients in tfie decomposition 

\K,M) = C^„\h,m{) ® \h,m,) . 

ki,k2mi,m2 

More generally, the Clebsch-Gordan decompositions say how the tensor product of two 
irreducible representations Vi and V2 of a compact Lie group (or its Lie algebra) decompose 
into irreducible representations Wi as Vi <SiV2 — (BjWj. The Clebsch-Gordan coefficients 
are the numerical coefficients in the projection from one of the summands Wj to (8) V2. 

Now suppose that Lq = L' © L' decomposes into two copies of the same Lie algebra L'. 
An important choice of Li is the diagonal Lie subalgebra given by elements of the form 
(a, a) with a G L'. Then as a Lie algebra Li = L'. The irreducible representations of 
Lq are given by tensor products of representations of Li. Therefore, decomposing the 
irreducible representations of Lq with respect to Li amounts to giving the Clebsch-Gordan 
decompositions. 

Much more could be said on the topic of symmetries and broken symmetries in physics. 
A nice overview is given in a paper by Buker [31]. It shows how the symmetry concept 
organizes not only the world of atoms and molecules that we considered here, but also 
that of elementary particles. The isospin symmetry between protons and neutrons has a 
symmetry group SU{2), which extends to the flavor symmetry group SU{3) for the three 
light quarks. 

Quantum field theory, culminating in the standard model, is also based on symmetries, 
namely the space-time symmetries of the Poincare group, and a gauge group U{2) (E) SU{3) 
which combines the broken symmetry group U{2) = U{1) (E) SU{2) of the weak interaction 
(of which only a diagonal subgroup U{1) encoding the electromagnetic charge is unbroken) 
with the unbroken color symmetry group U (3) of the strong interaction. 

While these topics lie far beyond the scope of this book, the interested reader will take 
the next step and consult deeper work of others who studies this in depth. Our journey is 
finished. 
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